Back to Course
NLP Specialist: BERT & Beyond
Module 3 of 8
3. The Math of Attention
1. The Query, Key, Value
$$ Attention(Q, K, V) = softmax(rac{QK^T}{sqrt{d_k}})V $$
- Query: What am I looking for?
- Key: What do I have?
- Value: What is the information?
$$ Attention(Q, K, V) = softmax(rac{QK^T}{sqrt{d_k}})V $$