Back to Course
LLM Engineering: Transformers & RAG
Module 12 of 12
12. AI Safety & Jailbreaking
1. Adversarial Attacks
"DAN" (Do Anything Now). Users try to trick the model into ignoring its safety training.
2. Red Teaming
The process of intentionally attacking your own model to find weaknesses before deployment.