Deep Learning: Convolutional Neural
Networks (CNN)
1 CNN applications
CNN Designed Figure
for 1:
image analysis, using
detection Figureconvolutional
2: segmentation layers to
automatically detect features like edges, textures, and
patterns. It consists of convolution, pooling, and fully
connected layers, which work together to capture spatial
hierarchies and reduce dimensionality, making it highly
effective for tasks like image classification, object detection,
and more.
Figure Classification
3: classification Identifies the primary object or category in an image, assigning
it a label (e.g., dog, car), without indicating its location.
Object Detection Identifies and locates multiple objects within an image, drawing
bounding boxes around them and classifying each object.
Segmentation Divides an image into regions by labeling each pixel with a
category, creating a detailed mask of objects and their
boundaries.
2 Motivation for the use of a CNN
In Deep Learning the manual feature extraction is not
necessary anymore because the neural network will do
it.
CNN advantages:
, You save a lot of time because it’s not manually extraction anymore &
it usually leads to a higher performance because it can filter out the
most important features, making it more robust against small changes
& having better performance with less weights
3 CNN architecture
Convolution:
Applying filters to input data, detecting patterns
& features by sliding over the image, producing
feature maps that summarize essential info
Ex: signal processing (sound, img), advanced
systems human designed filters fall outs, trained
through backpropagation
2D convolution This is sliding a 2D filter (kernel) over an input image. The
result is summed to create a single output value, which
forms a feature maps. This helps detect spatial features like
edges/textures, enabling models (like CNN) to recognize
patterns in imgs.
Valid mode Only compute values when the
filter kernel & input overlap
Same mode Zero patting such that the input
length = output length
Full mode Zero padding such that filterkernel has a minimum
overlap of 1 (here it can be the output is bigger
than the input)
Stride The number of pixels by which the filter shifts
Dilated convolution Dilated convolution expands the receptive field of a
convolutional filter by skipping certain input pixels,
introducing spaces (dilation rate) between the elements of
the kernel. This allows the network to capture broader,
more global context without increasing the number of
parameters or reducing resolution.
Useful for: image segmentation/obj detection where
understanding of local details and wider context is crutial
Dimensions of the output matrix: : examen
W = input width
H = input height
Networks (CNN)
1 CNN applications
CNN Designed Figure
for 1:
image analysis, using
detection Figureconvolutional
2: segmentation layers to
automatically detect features like edges, textures, and
patterns. It consists of convolution, pooling, and fully
connected layers, which work together to capture spatial
hierarchies and reduce dimensionality, making it highly
effective for tasks like image classification, object detection,
and more.
Figure Classification
3: classification Identifies the primary object or category in an image, assigning
it a label (e.g., dog, car), without indicating its location.
Object Detection Identifies and locates multiple objects within an image, drawing
bounding boxes around them and classifying each object.
Segmentation Divides an image into regions by labeling each pixel with a
category, creating a detailed mask of objects and their
boundaries.
2 Motivation for the use of a CNN
In Deep Learning the manual feature extraction is not
necessary anymore because the neural network will do
it.
CNN advantages:
, You save a lot of time because it’s not manually extraction anymore &
it usually leads to a higher performance because it can filter out the
most important features, making it more robust against small changes
& having better performance with less weights
3 CNN architecture
Convolution:
Applying filters to input data, detecting patterns
& features by sliding over the image, producing
feature maps that summarize essential info
Ex: signal processing (sound, img), advanced
systems human designed filters fall outs, trained
through backpropagation
2D convolution This is sliding a 2D filter (kernel) over an input image. The
result is summed to create a single output value, which
forms a feature maps. This helps detect spatial features like
edges/textures, enabling models (like CNN) to recognize
patterns in imgs.
Valid mode Only compute values when the
filter kernel & input overlap
Same mode Zero patting such that the input
length = output length
Full mode Zero padding such that filterkernel has a minimum
overlap of 1 (here it can be the output is bigger
than the input)
Stride The number of pixels by which the filter shifts
Dilated convolution Dilated convolution expands the receptive field of a
convolutional filter by skipping certain input pixels,
introducing spaces (dilation rate) between the elements of
the kernel. This allows the network to capture broader,
more global context without increasing the number of
parameters or reducing resolution.
Useful for: image segmentation/obj detection where
understanding of local details and wider context is crutial
Dimensions of the output matrix: : examen
W = input width
H = input height