AILearn
Image Processing BasicsImages as Arrays

Images as Arrays

30 min
Image Processing Basics

To computers, images are just arrays of numbers. Understanding this representation is essential for computer vision.

Definition

A digital image is a 2D or 3D array of pixel values. Grayscale images are 2D (height x width), while color images are 3D (height x width x channels, typically RGB).

Key Concepts

Pixel

The smallest unit of an image. Value typically 0-255 (8-bit) representing intensity.

Channels

Color images have 3 channels (RGB). Each channel is a 2D array of pixel values.

Resolution

The dimensions of an image (height x width). Higher resolution = more detail but more computation.

Normalization

Scaling pixel values to [0,1] or standardizing for neural network training.

Real-World Applications

Medical Imaging

Healthcare

X-rays, MRIs, and CT scans are processed as arrays for AI-assisted diagnosis.

Satellite Imagery

Remote Sensing

Environmental monitoring and mapping use multispectral image arrays with many channels.

Code Example

python
import numpy as np

# Creating a simple grayscale image
grayscale = np.array([
    [0, 50, 100, 150, 200, 255],
    [0, 50, 100, 150, 200, 255],
    [0, 50, 100, 150, 200, 255],
], dtype=np.uint8)

print("Grayscale image shape:", grayscale.shape)  # (3, 6)
print("Pixel values:\n", grayscale)

# Creating an RGB image (3 channels)
height, width = 4, 4
rgb_image = np.zeros((height, width, 3), dtype=np.uint8)

# Set some colors
rgb_image[0, 0] = [255, 0, 0]    # Red
rgb_image[0, 1] = [0, 255, 0]    # Green
rgb_image[0, 2] = [0, 0, 255]    # Blue
rgb_image[0, 3] = [255, 255, 0]  # Yellow

print("\nRGB image shape:", rgb_image.shape)  # (4, 4, 3)
print("Red channel:\n", rgb_image[:, :, 0])

# Normalization for neural networks
def normalize(image):
    """Scale to [0, 1]"""
    return image.astype(np.float32) / 255.0

def standardize(image, mean, std):
    """Standardize with dataset statistics"""
    return (image - mean) / std

normalized = normalize(rgb_image)
print("\nNormalized range:", normalized.min(), "-", normalized.max())

# Common preprocessing
# ImageNet mean and std (per channel)
imagenet_mean = np.array([0.485, 0.456, 0.406])
imagenet_std = np.array([0.229, 0.224, 0.225])

def preprocess_for_model(image):
    """Standard preprocessing pipeline"""
    # 1. Convert to float and normalize
    img = image.astype(np.float32) / 255.0
    # 2. Standardize using ImageNet statistics
    img = (img - imagenet_mean) / imagenet_std
    # 3. Transpose to channels-first (C, H, W) for PyTorch
    img = img.transpose(2, 0, 1)
    return img

print("\nPreprocessed shape:", preprocess_for_model(rgb_image).shape)

Understanding images as arrays demystifies computer vision. Every operation (filtering, transformations, neural network layers) is just array manipulation.

Practice Problems

  • 1Load a real image and extract RGB channels
  • 2Implement image resizing using numpy
  • 3Create simple filters (blur, sharpen) as convolution kernels

Summary

Images are multi-dimensional arrays that computers can process mathematically. This perspective is key to understanding how CNNs and other vision models work.