Back to Course
Reinforcement Learning: Agents
Module 8 of 8
8. Offline RL
1. Learning from History
In robotics, exploration is dangerous (crashing serves no one). Offline RL learns from a static dataset of previous logs (Replay Buffer) without interacting with the world.
2. Conservative Q-Learning (CQL)
Assumption: "If I haven't seen this action in the data, it's probably bad."