Back to Course
Deep Learning with PyTorch
Module 10 of 12
10. Audio Processing
1. Images of Sound
We don't feed raw waveforms to CNNs. We feed Spectrograms. It turns Time-Amplitude into Time-Frequency (like an image).
2. Wav2Vec & HuBERT
Self-supervised learning on audio. Masking parts of the sound and asking the model to guess the missing bits.