Back to Course
Data Intelligence: NumPy & Pandas
Module 13 of 15
14. Scaling: Dask & Ray
1. The RAM Limit
Pandas crashes if your dataset > RAM. Dask breaks the big dataframe into small chunks and processes them on disk.
2. Distributed Computing
Use Ray to parallelize Python code across a cluster of 100 machines.
pythonimport ray ray.init() @ray.remote def heavy_task(x): return x * x futures = [heavy_task.remote(i) for i in range(1000)] results = ray.get(futures)