TensorLearn
Back to Course
Reinforcement Learning: Agents
Module 4 of 8

4. PPO (Proximal Policy Optimization)

1. OpenAI's Favorite

DQN fails in continuous environments (Robot Arms). PPO is a "Policy Gradient" method. It directly learns the best Action.

2. The Clip Function

PPO prevents the model from changing its policy too drastically in one update, ensuring stability.

Mark as Completed

TensorLearn - AI Engineering for Professionals