Momentum & RMSprop Study Guide
Topic Overview
This study guide compares three major optimization algorithms—Standard SGD, SGD with
Momentum, and RMSprop—using a "hiking down a ravine" analogy to explain their
mechanics1. It focuses on how RMSprop utilizes adaptive learning rates to solve the "ravine
problem" by adjusting step sizes based on terrain volatility.
Core Concepts
● Standard SGD (Stochastic Gradient Descent): A basic optimization method that
takes steps of a fixed size regardless of the terrain. It is analogous to hiking
blindfolded; on steep slopes, you risk tumbling (overshooting), while on flat ground,
progress is painfully slow.
● SGD Momentum: An enhancement to SGD that accumulates velocity to move
faster, similar to running down a hill. While it helps gain speed, the momentum can
cause the algorithm to overshoot the track during sharp turns.
● RMSprop: An advanced algorithm that uses Adaptive Learning Rates to adjust step
size independently for each parameter. It acts like "smart shoes" that analyze the
"bumpiness" of recent terrain to automatically adjust your stride.
● Ravine: A specific landscape challenge in machine learning characterized by steep
walls and a flat bottom. Without adaptive methods, algorithms tend to bounce uselessly
against the walls rather than moving down the center.
Important: The RMSprop Mechanism
RMSprop works by calculating a "Volatility Meter" and then normalizing the step size.
Step 1: The "Volatility" Meter
This step calculates how shaky or volatile the recent steps have been by keeping a running