Back to Course
Neural Networks: From Scratch
Module 6 of 12
6. Layers & Activation
1. The Dense Layer
A collection of neurons connecting everything to everything. Matrix Multiplication is all you need. $$ Y = X cdot W^T + B $$
2. Non-Linearity (Activation)
Without activation functions like ReLU or Sigmoid, a deep network collapses into a single linear regression.
ReLU (Rectified Linear Unit)
- Formula: $f(x) = max(0, x)$
- Gradient: 1 if $x > 0$, else 0.
- Why: Solves the Vanishing Gradient problem. Fast to compute.