Back to Course
Reinforcement Learning: Agents
Module 7 of 8
7. Model-Based RL (AlphaZero)
1. Planning to Win
Standard RL guesses. AlphaZero simulates. It uses MCTS (Monte Carlo Tree Search) to look ahead into the future.
2. Self-Play
It starts knowing nothing. It plays millions of games against itself. The "Winner" teaches the "Loser".