Backward Propagation

A deep illustration to the Backward Propagation of Deep Neural Networks in a Mathematical way. Setup and notation Consider an L-layer Feedforward Neural Network (FNN/MLP). For layer $l=1,\dots,L$: $n_{l}$ = number of units in layer $l$. Input: $a^{(0)} = x \in \mathbb{R}^{n_0}$. Linear pre-activation: $z^{(l)} = W^{(l)} a^{(l-1)} + b^{(l)}$, where $W^{(l)}\in\mathbb{R}^{n_l\times n_{l-1}}$, $b^{(l)}\in\mathbb{R}^{n_l}$. Activation: $a^{(l)} = \phi^{(l)}(z^{(l)})$ (applied elementwise). Output $a^{(L)}$. Loss for one example: $\mathcal{L} = \mathcal{L}(a^{(L)}, y)$. We want gradients: ...