DeepLearning

Skip-Connection Instruction

A Practical Guide to Using Skip Connections Part 1: The “Why” - What Problems Do They Solve? Before adding them, it’s crucial to understand why they are so effective. Solving the Vanishing Gradient Problem: Problem: In very deep networks, the gradient (the signal used for learning) must be backpropagated from the final layer to the initial layers. With each step backward through a layer, the gradient is multiplied by the layer’s weights. If these weights are small (less than 1), the gradient can shrink exponentially, becoming so tiny that the early layers learn extremely slowly or not at all. This is the vanishing gradient problem. Solution: A skip connection creates a direct path for the gradient to flow. It’s like an “information highway” that bypasses several layers. The gradient is passed back through the addition/concatenation operation, providing a direct, uninterrupted path to the earlier layers, keeping the signal strong. Solving the Degradation Problem: ...

Reinforcement Learning: Preliminary

Okay, here’s a general guide for modeling and training Reinforcement Learning (RL) agents using PyTorch. This guide will cover the core components and steps, assuming you have a basic understanding of RL concepts (agent, environment, state, action, reward). Core RL Components in PyTorch Environment: Typically, you’ll use a library like gymnasium (the maintained fork of OpenAI Gym). Key methods: env.reset(), env.step(action), env.render(), env.close(). Key attributes: env.observation_space, env.action_space. Agent: The learning entity. It usually consists of: ...

DeepLearning Dataclass Guide

Okay, let’s craft a general and uniform guide for building a dataset class for image processing large models, focusing on PyTorch and PyTorch Lightning. This structure is highly adaptable. Core Principles for Your Dataset Class: Uniformity: The interface (__init__, __len__, __getitem__) should be consistent. Flexibility: Easily accommodate different data sources, label types, and transformations. Efficiency: Load data on-the-fly, leverage multi-processing in DataLoader, and handle large datasets without excessive memory usage. Clarity: Code should be well-commented and easy to understand. Reproducibility: Ensure that given the same settings, the dataset behaves identically (especially important for train/val/test splits). We’ll structure this around: ...

Diffusions: Denoising Diffusion Probabilistic Model

Pipeline $$ x = x_0 \to x_1 \to x_2 \to \cdots \to x_{T-1} x_T = z $$which follows the workflow structure: $$ x_t = \alpha_t x_{t-1} + \beta_t \epsilon_t, \quad \epsilon_t \in \mathcal{N}(0, \mathbf{I}) $$where $\alpha_t, \beta_t > 0$ and $\alpha^2 + \beta^2 = 1$. We do this repeatedly, and we will get: $$ \begin{align*} x_t &= \alpha_t x_{t-1} + \beta_t \epsilon_t\\ &= \alpha_t(\alpha_{t-1} x_{t-2} + \beta_{t-1} \epsilon_{t-1}) + \beta_t \epsilon_t\\ &= \cdots\\ &= (\alpha_t \cdots \alpha_1)x_0 + (\alpha_t \cdots \alpha_2)\beta_1\epsilon_1 + (\alpha_t \cdots \alpha_3)\beta_2\epsilon_2 + \cdots + \alpha_t\beta_{t-1}\epsilon{t-1} + \beta_t\epsilon_t \end{align*} $${% note info %} ⚠ BUG: Usage for latex align failed! {% endnote %}