Resumen

Summary Deep Learning (MSc AI)

Puntuación

Vendido

Páginas

Subido en

20-03-2024

Escrito en

2023/2024

Based on lectures. I got a 9 for the exam! Deep learning becomes the leading learning and modeling paradigm in machine learning. During this course, we will present basic components of deep learning, such as: - different layers (e.g., linear layers, convolutional layers, pooling layers, recurrent layers); - non-linear activation functions (e.g., sigmoid, ReLU); - backpropagation; - learning algorithms (e.g., ADAM); - other (e.g., dropout). Further, we will show how to build deep architectures like LeNet and AlexNet. We will explain potential pitfalls and possible solutions, e.g., by using residual connections and dense architectures. After discussing discriminative models, we will turn them into generative models. We will start with linear latent variable models like the probabilistic PCA. Then we will discuss a non-linear version of the pPCA, namely, Variational Auto-Encoders (VAEs). Both pPCA and VAE are so-called prescribed models that require formulating the likelihood function. On the other hand, we can alleviate it by considering implicit distribution. This is the main idea behind Generative Adversarial Networks (GANs). We will also discuss state-of-the-art models like autoregressive models and flow-based models. At the end of the course, we will outline recent developments in deep learning. Namely, we will present the attention mechanism, transformer networks, and deep embeddings. In the end, we will discuss Reinforcement Learning and Deep Reinforcement Learning.

Mostrar más Leer menos

Institución

Grado

Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Informar violación de derechos de autor

Escuela, estudio y materia

Institución: Vrije Universiteit Amsterdam (VU)
Estudio: Artificial Intelligence
Grado: Deep learning (XM_0083)

Todos documentos para esta materia (2)

Información del documento

Subido en: 20 de marzo de 2024
Número de páginas: 82
Escrito en: 2023/2024
Tipo: Resumen

Temas

deep learning

Vista previa del contenido

Deep Learning: learning the features of data and simplifying data
representations for the purpose of finding patterns.

Perceptron
● A perceptron is an artificial neuron
● It is the simplest possible Neural Network
● The initial perceptron design involved processing multiple binary inputs by multiplying
each input with a corresponding weight, summing the results along with a bias
parameter to yield a binary output (0 or 1).
● Perceptrons are linear functions:
○ Too simple and abstract to link the outputs of one perceptron to the inputs of
the next in a large network.
○ Composing together linear functions will only ever give you another linear
function. We’re not creating models that can learn non-linear functions.

Activation function
● An activation function is a scalar function (a function from a number to another
number) which we apply to the output of a perceptron after all the weighted inputs
have been combined.
● Typically, the activation function is nonlinear, which allows the network to learn
complex patterns in data.
● Not using an activation function is also called using a linear activation (linear
regression model).

Activation function Plot Goal

Linear activation Also known as the "no activation" function"
function (multiplied x1.0), where the activation is
proportional to the input. The function doesn't
do anything to the weighted sum of the input, it
simply spits out the value it was given.

, Binary Depends on a threshold value that decides
Non-linear step whether a neuron should be activated or not.

Sigmoid Converts input into a probability between 0 and
1; often used in binary classification.

Softmax Converts input into categorical probability
distribution by compressing multiple inputs into
a range between 0 and 1; used in multi-class
classification.

Tanh Similar to sigmoid but ranges from -1 to 1;
useful in hidden layers of a neural network.

ReLU Sets every negative input to zero, and keeps
everything else the same. Allows only positive
values to pass through; helps with non-linear
complex mappings and vanishing gradient
problems.

Leaky Similar to ReLU but allows a small, non-zero
ReLU gradient when the unit is not active; helps
prevent dead neurons.

Neural network
● Any arrangement of perceptrons and nonlinearities.

,Multilayer perceptron (MLP)
● A fully connected feed-forward artificial neural network with at least three layers
(input, output, and at least one hidden layer)
● Consists of:
○ Layer of hidden units in the middle, each of which acts as a perceptron
with a non-linearity (activation function), connecting to all input nodes.
○ One or more output nodes, connecting to all nodes in the hidden layer.
● Features:
1. There are no cycles, the network “feeds forward” from input to output.
2. Nodes in the same layer are not connected to each other, or to any other
layer than the previous one.
3. Each layer is fully connected to the previous layer, every node in one layer
connects to every node in the layer before it.
● Note: Every orange and blue line in this picture represents one parameter of
the model.
● Although perceptrons can only understand linear relationships between the input and
output data provided, MLPs are a neural network where the mapping between inputs
and output is non-linear.
● It learns weights in the process of backpropagation.

Finding good weights

Loss function
● Begin by determining a loss function to assess model performance.
● Seek weights (model parameters) that result in minimal loss over the data.
● Lower loss corresponds to a better-performing model.
● Note: It’s nice if the loss is zero when the prediction is perfect, but this isn’t required.
● Can be defined for a single instance or for all instances in the data. Usually, the loss
over the whole data is just the average loss over all instances.

machine learning loss function
algorithm

regression squared distance between y normal distribution
errors and t (fixed variance)

absolute take the absolute laplace distribution
errors magnitude of the (fixed variance)
difference between

, network output and the
target value

classification log loss binary cross-entropy Bernoulli distribution

log loss cross-entropy Categorical
distribution

hinge loss <none>

Loss Landscape
● Model space consists of all possible models, forming a plane.
● Loss can be visualized as a surface above this plane.
● Every point on the plane corresponds to a specific set of
weights (θ), and the loss function defines the loss for each
point.

Optimization Goal
● Objective is to navigate the loss surface and find a set of
weights (θ) that minimizes the loss on the chosen dataset:

● Use calculus to identify the lowest point on the loss surface.
● In one dimension, derivatives indicate the slope of a tangent
line, representing how much the function rises.
● For multiple dimensions, compute partial derivatives (derivatives with respect to
one of those variables, with the others held constant), forming the gradient.
● The gradient is a vector pointing in the direction of steepest ascent on the loss
surface.
● To minimize loss, move in the opposite direction of the gradient.
● Negative gradient points in the direction of steepest descent on the loss surface.

$10.15

Accede al documento completo:

100% de satisfacción garantizada

Inmediatamente disponible después del pago

Tanto en línea como en PDF

No estas atado a nada

Conoce al vendedor

vjblom

3.5

(2)

Conoce al vendedor

vjblom Vrije Universiteit Amsterdam

Ver perfil

Seguir

Vendido

Miembro desde

2 año

Número de seguidores

Documentos

Última venta

1 mes hace

3.5

2 reseñas

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

100% de satisfacción garantizada: ¿Cómo funciona?

Nuestra garantía de satisfacción le asegura que siempre encontrará un documento de estudio a tu medida. Tu rellenas un formulario y nuestro equipo de atención al cliente se encarga del resto.

Who am I buying this summary from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller vjblom. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy this summary for $10.15. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45,681 summaries were sold in the last 30 days Founded in 2010, the go-to place to buy summaries for 15 years now