Skip-Connection Instruction

A Practical Guide to Using Skip Connections Part 1: The “Why” - What Problems Do They Solve? Before adding them, it’s crucial to understand why they are so effective. Solving the Vanishing Gradient Problem: Problem: In very deep networks, the gradient (the signal used for learning) must be backpropagated from the final layer to the initial layers. With each step backward through a layer, the gradient is multiplied by the layer’s weights. If these weights are small (less than 1), the gradient can shrink exponentially, becoming so tiny that the early layers learn extremely slowly or not at all. This is the vanishing gradient problem. Solution: A skip connection creates a direct path for the gradient to flow. It’s like an “information highway” that bypasses several layers. The gradient is passed back through the addition/concatenation operation, providing a direct, uninterrupted path to the earlier layers, keeping the signal strong. Solving the Degradation Problem: ...

June 7, 2025 · 7 min · 1358 words · xxraincandyxx

Reinforcement Learning: Preliminary

Okay, here’s a general guide for modeling and training Reinforcement Learning (RL) agents using PyTorch. This guide will cover the core components and steps, assuming you have a basic understanding of RL concepts (agent, environment, state, action, reward). Core RL Components in PyTorch Environment: Typically, you’ll use a library like gymnasium (the maintained fork of OpenAI Gym). Key methods: env.reset(), env.step(action), env.render(), env.close(). Key attributes: env.observation_space, env.action_space. Agent: The learning entity. It usually consists of: ...

June 2, 2025 · 8 min · 1684 words · xxraincandyxx

DeepLearning Dataclass Guide

Okay, let’s craft a general and uniform guide for building a dataset class for image processing large models, focusing on PyTorch and PyTorch Lightning. This structure is highly adaptable. Core Principles for Your Dataset Class: Uniformity: The interface (__init__, __len__, __getitem__) should be consistent. Flexibility: Easily accommodate different data sources, label types, and transformations. Efficiency: Load data on-the-fly, leverage multi-processing in DataLoader, and handle large datasets without excessive memory usage. Clarity: Code should be well-commented and easy to understand. Reproducibility: Ensure that given the same settings, the dataset behaves identically (especially important for train/val/test splits). We’ll structure this around: ...

May 29, 2025 · 16 min · 3225 words · xxraincandyxx

Diffusions: Denoising Diffusion Probabilistic Model

Pipeline $$ x = x_0 \to x_1 \to x_2 \to \cdots \to x_{T-1} x_T = z $$which follows the workflow structure: $$ x_t = \alpha_t x_{t-1} + \beta_t \epsilon_t, \quad \epsilon_t \in \mathcal{N}(0, \mathbf{I}) $$where $\alpha_t, \beta_t > 0$ and $\alpha^2 + \beta^2 = 1$. We do this repeatedly, and we will get: $$ \begin{align*} x_t &= \alpha_t x_{t-1} + \beta_t \epsilon_t\\ &= \alpha_t(\alpha_{t-1} x_{t-2} + \beta_{t-1} \epsilon_{t-1}) + \beta_t \epsilon_t\\ &= \cdots\\ &= (\alpha_t \cdots \alpha_1)x_0 + (\alpha_t \cdots \alpha_2)\beta_1\epsilon_1 + (\alpha_t \cdots \alpha_3)\beta_2\epsilon_2 + \cdots + \alpha_t\beta_{t-1}\epsilon{t-1} + \beta_t\epsilon_t \end{align*} $${% note info %} ⚠ BUG: Usage for latex align failed! {% endnote %}

May 29, 2025 · 1 min · 106 words · xxraincandyxx