TensorLearn
Back to Course
Data Intelligence: NumPy & Pandas
Module 7 of 15

7. Cleaning Data

1. Entropy Reduction

Real data is dirty. It has holes, typos, and wrong types.

Handling Nulls

python
df.dropna() # Nuclear option: Delete any row with a hole. df.fillna(0) # Safe option: Fill holes with 0. df.ffill() # Forward Fill: Copy previous value (Good for Time Series).

Types

Memory optimization involves choosing the right types.

  • Base Object: Uses massive RAM.
  • Category: Uses integers under the hood. 100x RAM savings for repeated strings.

Mark as Completed

TensorLearn - AI Engineering for Professionals