12. AI Safety & Jailbreaking

1. Adversarial Attacks

"DAN" (Do Anything Now). Users try to trick the model into ignoring its safety training.

The process of intentionally attacking your own model to find weaknesses before deployment.