100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Deep learning - all material

Rating
-
Sold
-
Pages
27
Uploaded on
19-10-2024
Written in
2023/2024

This is a summary of all deep learning lessons. All exam material is summarized here.

Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
October 19, 2024
Number of pages
27
Written in
2023/2024
Type
Class notes
Professor(s)
Umut güçlü
Contains
All classes

Subjects

Content preview

Deep learning Lecture Notes

Lecture 1: Linear Algebra Refresher

Linear algebra is essential in the field of deep learning, as it is used to represent and manipulate high-
dimensional data, and to optimize the parameters of deep neural networks.

- A scalar is a single value, such as a number or a constant. It can be any real or complex
number.
- A vector is an array of numbers or scalars.
o The magnitude of a vector is the length of the arrow, and is represented by the
absolute value of the vector or the Euclidean norm.
- A matrix is a rectangular array of numbers or scalars. It can be used to represent a linear
transformation or a system of linear equations.
o Matrix multiplication is not commutative, meaning that A∗B is not the same as B∗A
o The determinant of a matrix is a scalar value that represents the scaling factor of the
matrix. It can be used to determine if a matrix is invertible and to find the inverse of a
matrix.
- A tensor is a multi-dimensional array of numbers or scalars. It can be used to represent high-
dimensional data, such as images or videos.
o Tensor contraction and tensor product are the two most common operations used on
tensors.
1. Tensor contraction is the process of summing over a set of indices to
reduce the number of dimensions in a tensor.
2. Tensor product is the operation of combining two or more tensors to form
a new tensor.
- The dot product/inner product is a way of multiplying two vectors together.
o It is a scalar value that can be used to measure the similarity between two vectors or
the angle between them.
o Given vectors ⃗v =[a 1 a 2 , a3 ] and ⃗
w =[ b1 , b 2 , b3 ], the dot product
is ⃗v ⋅⃗
w =a1 b 1+ a2 b2 +a3 b3 .
o The dot product of two vectors is equal to the magnitude of one vector multiplied by
the magnitude of the other vector multiplied by the cosine of the angle between
them.
- A matrix-vector product is a way of multiplying a matrix and a vector together. It is a vector
that represents the result of the linear transformation of the input vector by the matrix.
o The result is a new vector with the same number of rows as the matrix and the same
number of columns as the vector.
o The elements of the resulting vector are obtained by taking the dot product of each
row of the matrix with the vector.
o Example

1. [ ]
A= 1 2 and ⃗x =[5,6 ]
3 4

2. A ⃗x =[5 ×1+2 ×6
3 ×5+ 4 × 6
= ][ ]
17
39
=[17,39]
- Matrix-matrix multiplication is a way of multiplying two matrices together. The resulting
matrix represents the composition of the two original matrices as linear transformations.

, - A norm is a function that assigns a scalar value to a vector or a matrix. It can be used to
measure the size or distance of a vector or a matrix.
o The most common norm used in linear algebra is the Euclidean norm.
o Other norms include the L1 norm, which is the sum of the absolute values of the
components, and the max norm, which is the maximum value of the components.
These norms can be used to measure the sparsity or the maximum value of the
vector or matrix.
o Norms are used in deep learning to measure the size or distance of the parameters of
the neural network, and to regularize the model to prevent overfitting.

Applications

- Linear algebra is particularly used in the areas of neural networks and deep learning
architectures.
- Linear algebra concepts such as matrix-vector products, matrix-matrix multiplication, and
norms are used in the computation of forward and backward propagation in neural networks.
- Tensor operations such as tensor contraction and tensor product are used in convolutional
neural networks and recurrent neural networks to extract features from images and
sequences.
- Linear algebra concepts and operations are also used in optimization algorithms such as
gradient descent and stochastic gradient descent to adjust the parameters of the neural
network.

Lecture 2: Calculus Refresher

Calculus is essential in the field of deep learning, as it is used to optimise the parameters of deep
neural networks and to study the properties of activation functions used in these networks.

- The derivative of a function is a measure of the rate of change of the function at a certain
point.
' df ( a ) f ( x ) −f ( a ) f ( a+ h )−f ( a )
o f ( a )= =lim =lim
dx x→ a x −a h →0 h
o f′(x) is called the prime notation, and df(x)/dx is called the Leibniz notation.
o There are several rules for computing the derivatives of the basic functions and the
combined functions:

, o A partial derivative is the derivative of a multivariable function with respect to one
variable, while keeping the other variables constant. It measures the rate of change
of the output of the function with respect to one of its inputs, while ignoring the
effect of the other inputs.
' ∂f
1. f x ( x 1 , x 2 , … , x n ) = (x , x , … , xn )
i
∂ xi 1 2
o A gradient is a vector of partial derivatives of a multivariable function.
1. It represents the direction of the steepest ascent of the function, and can
be used in optimisation algorithms like gradient descent to update the
parameters of a model and improve its accuracy.
∂f ∂f ∂f
2. Let f ( x 1 , x 2 , … , x n ) , then the gradient of f is ∇ f =[ , ,…, ]
∂ x1 ∂ x2 ∂ xn
3. Example
 Let f ( x , y )=x 2− y 2
 Partial derivatives:
∂f
o =2 x
∂x
∂f
o =−2 y
∂y



[ ][
∂f
 Gradient: ∇ f =
∂f −2 y ]
∂ x = 2 x =[2 x −2 y ]

∂y
- Chain rule
o The derivative of the composition of two or more functions is equal to the derivative
of the outer function evaluated at the inner function, multiplied by the derivative of
the inner function.
df ( g ( x ) ) df ( u ) du
o f ( g ( x ) )= = ⋅
dx du dx
o Example
1. f ( x )=sin ( x ) g ( x ) =x2
df ( g ( x ) )
2. =cos ⁡(x 2 )⋅ 2 x
dx
o The chain rule is a crucial concept in
deep learning because it allows us to
compute the gradient of complex
functions, which are often
represented as the composition of
multiple simpler functions.
o The gradient is used in optimisation
algorithms like gradient descent to
update the weights of a deep learning model and improve its accuracy.
o By applying the chain rule, we can find the gradient of the loss function with respect
to the parameters of the model, which can be used to update the parameters in a
direction that reduces the loss.
R112,35
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
donjaschipper
4,0
(1)

Get to know the seller

Seller avatar
donjaschipper Radboud Universiteit Nijmegen
Follow You need to be logged in order to follow users or courses
Sold
4
Member since
1 year
Number of followers
0
Documents
9
Last sold
6 months ago

4,0

1 reviews

5
0
4
1
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions