Back to Course
Neural Networks: From Scratch
Module 12 of 12
12. Transformers from Scratch
1. Key, Query, Value
We implement Self-Attention using only NumPy.
python# The Heart of Modern AI attention = softmax((Q @ K.T) / sqrt(d_k)) @ V
2. Positional Encoding
Transformers have no sense of order. We must inject "Position" by adding sine/cosine waves to the input embeddings.