Back to Course
Data Intelligence: NumPy & Pandas
Module 2 of 15
2. NumPy Basics
1. The n-dimensional Array (ndarray)
Why is Python slow? Because it checks the type of every single element in a loop. Why is NumPy fast? Because it cheats. It pushes the loop down to C.
Memory Layout
A Python List is an array of pointers to scattered objects. A NumPy Array is a contiguous block of memory.
- CPU Cache: When the CPU reads one number, it accidentally reads the next 16 numbers too (Cache Line).
- Locality: Because NumPy arrays are contiguous, the CPU always guesses right. This is 50x-100x faster.
2. DataTypes (Dtypes)
In Python, an integer is variable size (28 bytes+). In NumPy, you define the EXACT size.
int8: 1 byte (-128 to 127).int64: 8 bytes (Standard).float32: 4 bytes (Single Precision for GPUs).
pythonimport numpy as np # Force the array to use small integers to save RAM a = np.array([1, 2, 3], dtype='int8')
3. Shape & Strides
How does a 1D block of memory act like a 2D matrix? Math.
- Shape: Dimensions, e.g.,
(3, 4)(3 rows, 4 cols). - Strides: How many bytes to step to get to the next element/row.
If you change the shape, you just change the metadata. You do not copy the data. (Crucial for performance).