Michiel de Folter 2024/2025
Deep Learning
Lecture 1: Introduction & the perceptron
Dataset: Is mostly split in 3 different datasets →
Training set: Is mainly used for training your algorithm.
Validation set: Is mainly used for finding the best hyperparameters. (or design
parameters) These parameters have to be chosen by the programmer.
Test set: Is used to test how much generalization error your algorithm has in a “real”
world use. Here we check the performance of your algorithm.
What is deep learning?:
AI: Hard coded knowledge to make decisions.
Often too complex to hard-code everything.
Machine learning: extract patterns from raw data.
depends heavily on the representation of the
features.
Representation learning: machine learning used
not only to extract patterns, but having it also learn
from the extraction as well.
Deep learning: Introduces representations that
are expressed in terms of other simpler
representations.
Artificial Neural Networks (ANN’s): A system which is supposed to look like a biological
neuron as it is in the brain.
Dendrites → Weights and inputs
Neuron → Transfer function
Axon (fire/ don’t fire) → Activation function
Deep Learning (DL): A multi-layered system with input layer (called a layer but is actually
just the input vectors), hidden layers and output layer. also derived from the layered
structure of the brain.
Meaning a visual stimulus does not simply find the visual as is. It used multiple parts of the
brain responsible for finding colors, patterns, forms and constructing the final image in our
brain as the combination of all.
Deep Learning frameworks:
,Michiel de Folter 2024/2025
Tensorflow by Google
Keras runs on Tensorflow
Pytorch runs on Torch and Caffe2
The perceptron: A single layer neural network used for binary classification. Outputs either
+1 or 0. (Yes/No)
For a perceptron to work, data needs to be linearly separable.
Perceptron function: b+𝗫∗𝗪
Activation function: b+𝗫∗𝗪 ≥ t y '=1
b+𝗫∗𝗪<t y '=0
Weight update rule: w ' i=w i+ α x i∗( y− y ' )
New weight of i=Old weight of i+learning rate∗input∗some function
Universal function approximator: Used to approximate an unknown function; y = f(x)
Linear algebra:
Scalar: 1 x 1 0D
Vector: 1 x N 1D
Matrix: N x M 2D
Tensor: N x M x C x … 3D, 4D, …
Transpose: flips the dimensions 900.
Vector: 1xN → Nx1
Matrix: NxM → MxN
❑
Matrix multiplication: C i , j =∑ A i , k∗Bk , j
k
Element wise product: Hadamard product for 2 matrices
C= A ∘ B
, Michiel de Folter 2024/2025
Lecture 2: MLP & the back propagation algorithm
Limitations of the perceptron:
- Decision boundary is linear.
- XOR function does not work.
XOR function can be made using multiple layer perceptrons. a combination of an AND-gate
and an OR-gate.
Multilayer perceptron: or feedforward neural network.
Dense layer: a layer where every neuron in the layer before is connected to every
neuron in the current layer.
Goal of a MLP: to approximate a function “f(x)” using the mapping y = f(x; θ) and to
learn the best values of parameters θ that result in the best function approximation.
Hidden layer function: h=f 1 ( X ; W ,c )
Output layer function: y=f 2 (h ; W , b)
Chained function: y=f 1 (f 2 ( X ))
Activation per layer:
a l=f (W l∗al−1 +bl )
Notation:
Deep Learning
Lecture 1: Introduction & the perceptron
Dataset: Is mostly split in 3 different datasets →
Training set: Is mainly used for training your algorithm.
Validation set: Is mainly used for finding the best hyperparameters. (or design
parameters) These parameters have to be chosen by the programmer.
Test set: Is used to test how much generalization error your algorithm has in a “real”
world use. Here we check the performance of your algorithm.
What is deep learning?:
AI: Hard coded knowledge to make decisions.
Often too complex to hard-code everything.
Machine learning: extract patterns from raw data.
depends heavily on the representation of the
features.
Representation learning: machine learning used
not only to extract patterns, but having it also learn
from the extraction as well.
Deep learning: Introduces representations that
are expressed in terms of other simpler
representations.
Artificial Neural Networks (ANN’s): A system which is supposed to look like a biological
neuron as it is in the brain.
Dendrites → Weights and inputs
Neuron → Transfer function
Axon (fire/ don’t fire) → Activation function
Deep Learning (DL): A multi-layered system with input layer (called a layer but is actually
just the input vectors), hidden layers and output layer. also derived from the layered
structure of the brain.
Meaning a visual stimulus does not simply find the visual as is. It used multiple parts of the
brain responsible for finding colors, patterns, forms and constructing the final image in our
brain as the combination of all.
Deep Learning frameworks:
,Michiel de Folter 2024/2025
Tensorflow by Google
Keras runs on Tensorflow
Pytorch runs on Torch and Caffe2
The perceptron: A single layer neural network used for binary classification. Outputs either
+1 or 0. (Yes/No)
For a perceptron to work, data needs to be linearly separable.
Perceptron function: b+𝗫∗𝗪
Activation function: b+𝗫∗𝗪 ≥ t y '=1
b+𝗫∗𝗪<t y '=0
Weight update rule: w ' i=w i+ α x i∗( y− y ' )
New weight of i=Old weight of i+learning rate∗input∗some function
Universal function approximator: Used to approximate an unknown function; y = f(x)
Linear algebra:
Scalar: 1 x 1 0D
Vector: 1 x N 1D
Matrix: N x M 2D
Tensor: N x M x C x … 3D, 4D, …
Transpose: flips the dimensions 900.
Vector: 1xN → Nx1
Matrix: NxM → MxN
❑
Matrix multiplication: C i , j =∑ A i , k∗Bk , j
k
Element wise product: Hadamard product for 2 matrices
C= A ∘ B
, Michiel de Folter 2024/2025
Lecture 2: MLP & the back propagation algorithm
Limitations of the perceptron:
- Decision boundary is linear.
- XOR function does not work.
XOR function can be made using multiple layer perceptrons. a combination of an AND-gate
and an OR-gate.
Multilayer perceptron: or feedforward neural network.
Dense layer: a layer where every neuron in the layer before is connected to every
neuron in the current layer.
Goal of a MLP: to approximate a function “f(x)” using the mapping y = f(x; θ) and to
learn the best values of parameters θ that result in the best function approximation.
Hidden layer function: h=f 1 ( X ; W ,c )
Output layer function: y=f 2 (h ; W , b)
Chained function: y=f 1 (f 2 ( X ))
Activation per layer:
a l=f (W l∗al−1 +bl )
Notation: