4. The Computational Graph

Forward: $y = x * w$
Backward: $x.grad += y.grad * w$

1. The Chain Rule: Visualized

How does a weight in the first layer affect the error in the last layer? We propagate the error backwards, layer by layer.

$$ rac{dL}{dw} = rac{dL}{dy} * rac{dy}{dw} $$

Think of the calculation as a tree. We walk from the leaves (Loss) back to the roots (Weights).

      (Loss)
        |
      (Pred)
      /    \
   (w2)    (h)
           / \
        (w1) (x)

We will build a graph engine that remembers operations to calculate these derivatives automatically.