TensorLearn
Back to Course
LLM Engineering: Transformers & RAG
Module 10 of 12

10. RLHF (Training ChatGPT)

1. SFT (Supervised Fine-Tuning)

Data: (Question, Good Answer). Result: A model that mimics the dataset.

2. Reward Model

Data: (Question, Answer A, Answer B). Human picks A. Result: A model that predicts "Human Preference".

3. PPO

Optimize the LLM to maximize the Reward Model's score.

Mark as Completed

TensorLearn - AI Engineering for Professionals