Deep Learning: session 5 – Recurrent
Neural Networks
1 Recurrent Neural Networks
Recurrent Neural Neural networks designed for sequential data, using
Networks feedback connections for temporal dependencies.
It processes sequential data by using feedback loops to
retain temporal context through a hidden state. This
enables tasks requiring time and order dependencies,
such as language modeling, speech recognition, and
time-series forecasting.
Different modes:
Vanilla Neural Network
o = fixed input sent through a set of hidden layers & produces a single
output
o
One-to-many
o = input length is fixed but the output has a
varying length
o
markets).
generation, time series forecasting (e.g., stock
Applications: text prediction, music and code
Many-to-one
o = input has a varying length but the output is fixed
o Ex: what is your emotion? (negative, neutral or positive)
o
Many-to-many
o = both input & output have a varying length
o Variant 1:
Ex: make a sequence of words being converted to another sequence
of words (DeepL)
o Variant 2:
Both the input & output have a varying length. Make a prediction for
each element in the sequence.
Ex: video classification on the level of video frames
,
1.1 Architecture of a RNN
1.1.1RNN cell
An RNN cell processes sequential data by
maintaining a hidden state that captures information about previous inputs. Here's
how it works:
1. Input and Hidden State: At each time step tt, the cell takes the current
input xtxt and the previous hidden state ht−1ht−1 as inputs.
2. Computation: The cell applies a nonlinear function (e.g., tanh or ReLU) to a
weighted combination of xt and ht-1 , producing the current hidden state ht:
where W
h,Wx , and bb are learnable parameters.
3. Output: ht can be used as:
o The hidden state passed to the next
time step.
o The output for the current time step,
depending on the application.
The feedback loop in the hidden state allows the network to remember and process
temporal dependencies in sequential data.
1.2 Problems with RNN
Vanishing & Exploding gradients:
= they hinder the ability to remember long sequences.
Vanishing Gradients: Gradients shrink during backpropagation, preventing
the network from learning long-term dependencies.
Exploding Gradients: Gradients grow excessively, destabilizing training.
Overwriting Memory: Hidden states are updated at each step, causing older
information to be forgotten.
Solution LSTM
cell
Neural Networks
1 Recurrent Neural Networks
Recurrent Neural Neural networks designed for sequential data, using
Networks feedback connections for temporal dependencies.
It processes sequential data by using feedback loops to
retain temporal context through a hidden state. This
enables tasks requiring time and order dependencies,
such as language modeling, speech recognition, and
time-series forecasting.
Different modes:
Vanilla Neural Network
o = fixed input sent through a set of hidden layers & produces a single
output
o
One-to-many
o = input length is fixed but the output has a
varying length
o
markets).
generation, time series forecasting (e.g., stock
Applications: text prediction, music and code
Many-to-one
o = input has a varying length but the output is fixed
o Ex: what is your emotion? (negative, neutral or positive)
o
Many-to-many
o = both input & output have a varying length
o Variant 1:
Ex: make a sequence of words being converted to another sequence
of words (DeepL)
o Variant 2:
Both the input & output have a varying length. Make a prediction for
each element in the sequence.
Ex: video classification on the level of video frames
,
1.1 Architecture of a RNN
1.1.1RNN cell
An RNN cell processes sequential data by
maintaining a hidden state that captures information about previous inputs. Here's
how it works:
1. Input and Hidden State: At each time step tt, the cell takes the current
input xtxt and the previous hidden state ht−1ht−1 as inputs.
2. Computation: The cell applies a nonlinear function (e.g., tanh or ReLU) to a
weighted combination of xt and ht-1 , producing the current hidden state ht:
where W
h,Wx , and bb are learnable parameters.
3. Output: ht can be used as:
o The hidden state passed to the next
time step.
o The output for the current time step,
depending on the application.
The feedback loop in the hidden state allows the network to remember and process
temporal dependencies in sequential data.
1.2 Problems with RNN
Vanishing & Exploding gradients:
= they hinder the ability to remember long sequences.
Vanishing Gradients: Gradients shrink during backpropagation, preventing
the network from learning long-term dependencies.
Exploding Gradients: Gradients grow excessively, destabilizing training.
Overwriting Memory: Hidden states are updated at each step, causing older
information to be forgotten.
Solution LSTM
cell