Neural Networks: From Scratch

Module 11 of 12

11. Deep Learning Cheatsheet

Derivatives

Power Rule: $rac{d}{dx} x^n = n x^{n-1}$
Chain Rule: $rac{dy}{dx} = rac{dy}{du} rac{du}{dx}$
ReLU: 1 if $x > 0$ else 0
Sigmoid: $sigma(x) (1 - sigma(x))$

Shapes

Input (Batch): $(N, D_{in})$
Weights: $(D_{in}, D_{out})$
Bias: $(D_{out},)$
Output: $(N, D_{out})$

Update Rule

$$ W leftarrow W - alpha rac{partial L}{partial W} $$

Autograd Algo

Topological Sort of the Graph.
Call backward() on parents in reverse order.

Report an issue or suggest an improvement

Mark as Completed

Previous Next Lesson

TensorLearn - AI Engineering for Professionals