7. Model-Based RL (AlphaZero)

1. Planning to Win

Standard RL guesses. AlphaZero simulates. It uses MCTS (Monte Carlo Tree Search) to look ahead into the future.

It starts knowing nothing. It plays millions of games against itself. The "Winner" teaches the "Loser".