8. Action Recognition

Slow: High resolution, low frame rate (Spatial details).
Fast: Low resolution, high frame rate (Motion details).

1. Video Classification

Classifying a single image is easy. Classifying "Swimming" requires Time. 3D CNNs (Conv3d) convolve over Height, Width, AND Time.

Two pathways: