Back to Course
Neural Networks: From Scratch
Module 11 of 12
11. Deep Learning Cheatsheet
Derivatives
- Power Rule: $rac{d}{dx} x^n = n x^{n-1}$
- Chain Rule: $rac{dy}{dx} = rac{dy}{du} rac{du}{dx}$
- ReLU: 1 if $x > 0$ else 0
- Sigmoid: $sigma(x) (1 - sigma(x))$
Shapes
- Input (Batch): $(N, D_{in})$
- Weights: $(D_{in}, D_{out})$
- Bias: $(D_{out},)$
- Output: $(N, D_{out})$
Update Rule
$$ W leftarrow W - alpha rac{partial L}{partial W} $$
Autograd Algo
- Topological Sort of the Graph.
- Call
backward()on parents in reverse order.