6. Model Distillation

1. Teacher & Student

Train a huge model (Teacher). Train a tiny model (Student) to mimic the Teacher's outputs. Result: 95% of the performance at 10% of the cost.