TensorLearn
Back to Course
NLP Specialist: BERT & Beyond
Module 7 of 8

7. Speech-to-Text (Whisper)

1. Weak Supervision

OpenAI trained Whisper on 680,000 hours of internet audio. It solves accents, background noise, and even translation implicitly.

2. The Architecture

It is a standard Transformer Encoder-Decoder. Input: Log-Mel Spectrogram. Output: Text tokens.

Mark as Completed

TensorLearn - AI Engineering for Professionals