FOR BUSINESS
MODULE: COMPUTER VISION (CV)
TERM 1 LECTURE NOTES: IMAGE PROCESSING, CNNS, AND REAL-WORLD
APPLICATIONS
1. THE CHALLENGE OF MACHINE VISION
To a human, an image is a collection of objects and context. To a computer, an
image is merely a Matrix of Numbers (pixels).
• Grayscale: A 2D matrix where values range from 0 (Black) to 255 (White).
• RGB (Color): A 3D tensor with three layers (Red, Green, Blue).
The goal of Computer Vision is to bridge this "Semantic Gap"—turning raw pixel
data into meaningful labels and spatial understanding.
2. CONVOLUTIONAL NEURAL NETWORKS (CNNs)
The 2026 industry standard for vision tasks is the CNN. Unlike standard neural
networks, CNNs are designed to recognize patterns regardless of where they
appear in the frame (Spatial Invariance).
2.1 The CNN Architecture Layers
1. Convolutional Layer: Uses "Filters" (Kernels) to scan the image. These filters
learn to detect specific features like edges, corners, or textures.
2. Activation Function (ReLU): Introduces non-linearity, allowing the model to
learn complex patterns.
3. Pooling Layer (Max Pooling): Reduces the size of the image data while
keeping the most important information. This makes the model faster and
prevents overfitting.
4. Fully Connected Layer: The final layers that classify the detected features
into categories (e.g., "Defective Part" vs. "Normal Part").