8. Offline RL

1. Learning from History

In robotics, exploration is dangerous (crashing serves no one). Offline RL learns from a static dataset of previous logs (Replay Buffer) without interacting with the world.

2. Conservative Q-Learning (CQL)

Assumption: "If I haven't seen this action in the data, it's probably bad."

Report an issue or suggest an improvement

Mark as Completed

Previous Next Lesson