Back to Course
Data Intelligence: NumPy & Pandas
Module 7 of 15
7. Cleaning Data
1. Entropy Reduction
Real data is dirty. It has holes, typos, and wrong types.
Handling Nulls
pythondf.dropna() # Nuclear option: Delete any row with a hole. df.fillna(0) # Safe option: Fill holes with 0. df.ffill() # Forward Fill: Copy previous value (Good for Time Series).
Types
Memory optimization involves choosing the right types.
- Base Object: Uses massive RAM.
- Category: Uses integers under the hood. 100x RAM savings for repeated strings.