What is Machine Learning?
Field of study that gives computers the ability to learn without being explicitly
programmed.
What is a Neural Network?
A neural network is a machine learning model made of layers of connected nodes
(neurons) that learn patterns in data to make predictions or decisions.
The three basic type of layers in a Neural Network
1) Input Layer
-Where data enters the network
2) Hidden Layer
-Where the network learns patters
-Can be multiple hidden layers
3) Output Layer
-Where the final prediction is made
Name the specialized types of layers (hidden layers) in a neural network covered in
lecture.
Dense (Fully Connected) Layer
- Every neuron connects to all neurons in the previous layer.
Convolutional Layer
- Detects spatial features in data (commonly used in image models).
Dilated Convolution Layer - A variation of convolution that spreads out the kernel,
allowing the model to capture wider context without increasing computation. Often used
in image and sequence data.
Deconvolutional Layer
- Upsamples feature maps, often used in image generation or reconstruction.
Pooling Layer
- Downsamples the input, reducing spatial size and computation.
Unpooling Layer
- Reverses pooling to restore spatial dimensions (used in segmentation tasks).
,Embedding Layer
- Converts discrete items (e.g., words) into continuous vectors for processing.
Attention Layer
-Computes the relevance or importance of different parts of the input to each other.
Used in models like Transformers to capture long-range dependencies in data.
Self-Attention Layer
-A special type of attention where the model learns relationships between elements
within the same input sequence. Crucial for models like Transformers, enabling parallel
processing of input data.
Below were not really covered, but are important enough to know
Normalization Layer
- E.g., Batch Normalization; stabilizes and speeds up training.
Dropout Layer
- Randomly disables neurons during training to prevent overfitting.
What is Supervised Learning?
A machine learning model learns from examples that include both the input data and the
correct answer, so it can predict the answer for new data.
In supervised learning, each example is a pair consisting of an input object (typically a
vector) and a desired output value (also called the supervisory signal).
A Classification problem has...?
- A target variable you want to predict (called a 'class' or a 'label'
- A set of historical data where that target label is known
- New data where that target variable is unknown
- A model
What is a model?
A mathematical object that takes data, where the label/class is unknown and assigns it
a label/class
What is classification? How does it relate to ML?
The process of deciding between categories. For example: Is this person likely to pay
back a loan or not.
This relates to ML because, by training a model on historic data, we can estimate how
likely someone is to pay back a loan.
Linear Classifier for Binary Classification
,Formula:
class = sign(w * input + w_0)
w (weight) and input are vectors.
w_0 is the bias term
Linear Classifier for Multiclass Classification
for each class:
score_i = w_i⋅x + b_i
Which ever class has the highest score is the predicted class
What are weights?
Weights determine how much influence an input feature has.
How do weights relate to loss?
Weights are used to reduce Loss through training and optimization techniques such as
Gradient Descent
What is Loss?
It quantifies the error made by the model — the smaller the loss, the better the model is
performing on that data.
What is a hyperparameter?
An argument to the model that determines how the model behaves
Classification vs Regression
classification:
-predict class label (Labels are discrete)
Ex) Predicting a movie rating as 1, 2, 3, 4, or 5 stars
regression:
-predict continuous quantity
Ex) Predicting a movie rating as 1 through 5 stars like 4.2
What is Gradient Descent?
An optimization algorithm used to minimize a loss function by iteratively adjusting the
model's parameters.
1) Start with a random set of weights
2) Classify some number of points (called our batch size)
3) The gradient with respect to the weights will compute the direction and rate of
steepest increase of the loss,
, 4) Adjust weights in the direction of the gradient that minimizes the loss.
5) Iterate until convergence.
Name the 3 different kind of Loss Functions
1) Cross Entropy Loss
2) Hinge Loss
3) Squared Hinge Loss
What is Cross Entropy Loss? Give the formula, it's behavior, what it does, and what it's
used for
Formula: -Summ(yi * log(pi)) for all classes
Behavior: Penalizes confident but wrong predictions heavily.
What it does: Measures how well the predicted probability distribution matches the true
labels.
Used for: Classification problems, especially binary and multiclass (often with softmax).
What is Hinge Loss? Give the formula, it's behavior, what it does, and what it's used for
Behavior: If the prediction is correct and beyond the margin, loss is 0. Otherwise, it
increases linearly.
What it does: Encourages the model to make not just correct predictions, but with a
margin (confidence).
Used for: Binary classification with models like Support Vector Machines (SVMs).
What is Squared Hinge Loss? Give the formula, it's behavior, what it does, and what it's
used for
Behavior: If the prediction is correct and beyond the margin, loss is 0. Otherwise,
increases more sharply for predictions that are wrong or too close to the margin.
What it does: Encourages the model to make not just correct predictions, but with a
margin (confidence). The margin is squared
Used for: Also in SVMs, but with stronger penalty for violations.
The root of supervised learning
Given some function that classifies our data, minimize a given loss function.
Pros and Cons of Linear Classifiers