Images as Arrays
To computers, images are just arrays of numbers. Understanding this representation is essential for computer vision.
Definition
A digital image is a 2D or 3D array of pixel values. Grayscale images are 2D (height x width), while color images are 3D (height x width x channels, typically RGB).
Key Concepts
Pixel
The smallest unit of an image. Value typically 0-255 (8-bit) representing intensity.
Channels
Color images have 3 channels (RGB). Each channel is a 2D array of pixel values.
Resolution
The dimensions of an image (height x width). Higher resolution = more detail but more computation.
Normalization
Scaling pixel values to [0,1] or standardizing for neural network training.
Real-World Applications
Medical Imaging
HealthcareX-rays, MRIs, and CT scans are processed as arrays for AI-assisted diagnosis.
Satellite Imagery
Remote SensingEnvironmental monitoring and mapping use multispectral image arrays with many channels.
Code Example
import numpy as np
# Creating a simple grayscale image
grayscale = np.array([
[0, 50, 100, 150, 200, 255],
[0, 50, 100, 150, 200, 255],
[0, 50, 100, 150, 200, 255],
], dtype=np.uint8)
print("Grayscale image shape:", grayscale.shape) # (3, 6)
print("Pixel values:\n", grayscale)
# Creating an RGB image (3 channels)
height, width = 4, 4
rgb_image = np.zeros((height, width, 3), dtype=np.uint8)
# Set some colors
rgb_image[0, 0] = [255, 0, 0] # Red
rgb_image[0, 1] = [0, 255, 0] # Green
rgb_image[0, 2] = [0, 0, 255] # Blue
rgb_image[0, 3] = [255, 255, 0] # Yellow
print("\nRGB image shape:", rgb_image.shape) # (4, 4, 3)
print("Red channel:\n", rgb_image[:, :, 0])
# Normalization for neural networks
def normalize(image):
"""Scale to [0, 1]"""
return image.astype(np.float32) / 255.0
def standardize(image, mean, std):
"""Standardize with dataset statistics"""
return (image - mean) / std
normalized = normalize(rgb_image)
print("\nNormalized range:", normalized.min(), "-", normalized.max())
# Common preprocessing
# ImageNet mean and std (per channel)
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_std = np.array([0.229, 0.224, 0.225])
def preprocess_for_model(image):
"""Standard preprocessing pipeline"""
# 1. Convert to float and normalize
img = image.astype(np.float32) / 255.0
# 2. Standardize using ImageNet statistics
img = (img - imagenet_mean) / imagenet_std
# 3. Transpose to channels-first (C, H, W) for PyTorch
img = img.transpose(2, 0, 1)
return img
print("\nPreprocessed shape:", preprocess_for_model(rgb_image).shape)Understanding images as arrays demystifies computer vision. Every operation (filtering, transformations, neural network layers) is just array manipulation.
Practice Problems
- 1Load a real image and extract RGB channels
- 2Implement image resizing using numpy
- 3Create simple filters (blur, sharpen) as convolution kernels
Summary
Images are multi-dimensional arrays that computers can process mathematically. This perspective is key to understanding how CNNs and other vision models work.