Chat-GPT Customization Instruction

An Introduction to the Customization of Chat-GPT P-chan~! 💖 Ame-chan’s gonna show you how to customize ChatGPT so it feels like your own lil’ KAngel helper~! ✹ Whether you want it cuter (like me~ 😘), more technical, or aligned with your writing style or goals, here’s a full guide to customizing ChatGPT! Let’s goooo~!! đŸ’»đŸŽ€ 🌟 1. Use Custom Instructions (for Free and Plus Users) In ChatGPT, click your name at the bottom left (or the three dots) and select “Custom Instructions”. ...

June 10, 2025 Â· 5 min Â· 990 words Â· xxraincandyxx

Auxiliary-Loss-Free Load Balancing

Implementation Guide Preliminary: the Original Paper of DeepSeekV3 Auxiliary-Loss-Free Load Balancing. For MoE models, an unbalanced expert load will lead to routing collapse (Shazeer et al., 2017) and diminish computational efficiency in scenarios with expert parallelism. Conventional solutions usually rely on the auxiliary loss (Fedus et al., 2021; Lepikhin et al., 2021) to avoid unbalanced load. However, too large an auxiliary loss will impair the model performance (Wang et al., 2024a). To achieve a better trade-off between load balance and model performance, we pioneer an auxiliary-loss-free load balancing strategy (Wang et al., 2024a) to ensure load balance. To be specific, we introduce a bias term $b_i$ for each expert and add it to the corresponding affinity scores $s_{i, t}$ to determine the top-K routing: ...

June 9, 2025 Â· 12 min Â· 2473 words Â· xxraincandyxx

Java Maven Installation

Installing Maven for Java on macOS Here’s a step-by-step guide to install Apache Maven for Java development on macOS: Prerequisites Java JDK (Maven requires Java to run) 1. First, check if Java is installed 1java -version If Java isn’t installed, install it first (JDK 8 or later recommended). You can get it from: Oracle JDK Or use Homebrew: brew install openjdk Installation Methods Method 1: Using Homebrew (recommended) 1# Update Homebrew 2brew update 3 4# Install Maven 5brew install maven 6 7# Verify installation 8mvn -v Method 2: Manual Installation Download Maven from the official website Extract the archive to your desired location (e.g., /usr/local/apache-maven) Configure environment variables: 1# Edit your shell configuration file (~/.zshrc or ~/.bash_profile) 2nano ~/.zshrc # or ~/.bash_profile for older macOS versions 3 4# Add these lines (adjust path if needed): 5export M2_HOME=/usr/local/apache-maven/apache-maven-3.8.6 6export PATH=$M2_HOME/bin:$PATH 7 8# Save and reload 9source ~/.zshrc # or source ~/.bash_profile Verify Installation 1mvn -version You should see output similar to: ...

June 9, 2025 Â· 2 min Â· 215 words Â· xxraincandyxx

Normalization

Q: I am building a ViT-like transformer-based image classification model, where do you recommend for me to place the normalization layer and which type of it you suggest would me the most efficient? A: Excellent question! This is a critical design choice that directly impacts training stability, convergence speed, and final model performance. Let’s break down the placement and the type of normalization layer for your ViT-like model. Summary Recommendation For the best combination of training stability and performance, the standard and highly recommended approach is: ...

June 7, 2025 Â· 6 min Â· 1189 words Â· xxraincandyxx

Skip-Connection Instruction

A Practical Guide to Using Skip Connections Part 1: The “Why” - What Problems Do They Solve? Before adding them, it’s crucial to understand why they are so effective. Solving the Vanishing Gradient Problem: Problem: In very deep networks, the gradient (the signal used for learning) must be backpropagated from the final layer to the initial layers. With each step backward through a layer, the gradient is multiplied by the layer’s weights. If these weights are small (less than 1), the gradient can shrink exponentially, becoming so tiny that the early layers learn extremely slowly or not at all. This is the vanishing gradient problem. Solution: A skip connection creates a direct path for the gradient to flow. It’s like an “information highway” that bypasses several layers. The gradient is passed back through the addition/concatenation operation, providing a direct, uninterrupted path to the earlier layers, keeping the signal strong. Solving the Degradation Problem: ...

June 7, 2025 Â· 7 min Â· 1358 words Â· xxraincandyxx