12. Transformers from Scratch

1. Key, Query, Value

We implement Self-Attention using only NumPy.

python
# The Heart of Modern AI
attention = softmax((Q @ K.T) / sqrt(d_k)) @ V

Transformers have no sense of order. We must inject "Position" by adding sine/cosine waves to the input embeddings.