Back to Course
NLP Specialist: BERT & Beyond
Module 6 of 8
6. Model Distillation
1. Teacher & Student
Train a huge model (Teacher). Train a tiny model (Student) to mimic the Teacher's outputs. Result: 95% of the performance at 10% of the cost.
Train a huge model (Teacher). Train a tiny model (Student) to mimic the Teacher's outputs. Result: 95% of the performance at 10% of the cost.