Back to Course
Reinforcement Learning: Agents
Module 4 of 8
4. PPO (Proximal Policy Optimization)
1. OpenAI's Favorite
DQN fails in continuous environments (Robot Arms). PPO is a "Policy Gradient" method. It directly learns the best Action.
2. The Clip Function
PPO prevents the model from changing its policy too drastically in one update, ensuring stability.