CSE 575 Statistical Machine Learning -
Study Guide [2025]
Table of Contents
1. Introduction to Statistical Machine Learning
2. Probability and Statistics Fundamentals
3. Linear Models for Regression and Classification
4. Bayesian Methods
5. Kernel Methods and Support Vector Machines (SVMs)
6. Probabilistic Graphical Models
At
7. Dimensionality Reduction
8. Ensemble Methods
ee
9. Unsupervised Learning
10. Model Evaluation and Selection
11. Optimization Techniques in Machine Learning
qa
12. Advanced Topics
13. Applications of Statistical Machine Learning
Kh
14. Summary and Further Reading
ad
1. Introduction to Statistical Machine Learning
am
Overview of Machine Learning
Machine Learning (ML) is a subfield of artificial intelligence that focuses on developing
algorithms and statistical models that enable computer systems to improve their performance
on specific tasks through experience, without being explicitly programmed for every
scenario.
Key Definitions:
Algorithm: A set of rules or instructions for solving a problem
Model: A mathematical representation of a real-world process
Training: The process of teaching an algorithm using data
Prediction: Using a trained model to make estimates about new, unseen data
Statistical Machine Learning specifically emphasizes the probabilistic and statistical
foundations underlying ML algorithms. It treats learning as a statistical inference problem
where we aim to discover patterns and relationships in data while quantifying uncertainty.
,Types of Learning: Supervised, Unsupervised, Semi-supervised,
Reinforcement
Supervised Learning
In supervised learning, algorithms learn from labeled training data to make predictions or
decisions.
Characteristics:
Input-output pairs (X, y) are provided during training
Goal is to learn a mapping function f: X → y
Performance can be directly measured against known correct answers
Types:
1. Classification: Predicting discrete class labels
o Example: Email spam detection (spam/not spam)
o Output: Categorical variables
At
2. Regression: Predicting continuous numerical values
o Example: House price prediction
ee
o Output: Real-valued numbers
qa
Mathematical Formulation: Given training data D = {(x₁, y₁), (x₂, y₂), ..., (xₙ, yₙ)}, find
function f such that f(x) ≈ y for new inputs.
Kh
Unsupervised Learning
ad
Algorithms find hidden patterns in data without labeled examples.
am
Characteristics:
Only input data X is provided (no target labels)
Goal is to discover hidden structure or patterns
No direct measure of "correct" answer
Common Tasks:
1. Clustering: Grouping similar data points
2. Dimensionality Reduction: Finding lower-dimensional representations
3. Density Estimation: Modeling data distribution
4. Anomaly Detection: Identifying unusual patterns
Semi-supervised Learning
Combines small amounts of labeled data with large amounts of unlabeled data.
Motivation:
Labeled data is expensive and time-consuming to obtain
, Unlabeled data is abundant and cheap
Leverages structure in unlabeled data to improve learning
Assumptions:
Smoothness: Points close to each other likely have same label
Cluster assumption: Data forms discrete clusters
Manifold assumption: Data lies on low-dimensional manifold
Reinforcement Learning
Learning through interaction with an environment to maximize cumulative reward.
Key Components:
Agent: The learner/decision maker
Environment: External system agent interacts with
State: Current situation of the agent
Action: Choices available to agent
At
Reward: Feedback signal from environment
ee
Goal: Learn policy π(s) → a that maximizes expected cumulative reward.
qa
Role of Statistics in Machine Learning
Statistics provides the theoretical foundation for machine learning by offering:
Kh
1. Probabilistic Framework: Modeling uncertainty and variability in data
ad
2. Inference Methods: Drawing conclusions from sample data about populations
3. Hypothesis Testing: Validating model assumptions and comparing models
4. Estimation Theory: Methods for parameter estimation and confidence intervals
am
5. Information Theory: Measuring information content and model complexity
Key Statistical Concepts in ML:
Bias-Variance Tradeoff: Balancing underfitting and overfitting
Maximum Likelihood Estimation: Parameter estimation method
Bayesian Inference: Incorporating prior knowledge and updating beliefs
Cross-Validation: Model selection and performance estimation
Regularization: Preventing overfitting through complexity penalties
Summary - Introduction to Statistical Machine Learning: Statistical Machine Learning
combines computational algorithms with statistical theory to extract patterns from data. The
four main learning paradigms (supervised, unsupervised, semi-supervised, reinforcement)
address different types of problems and data availability scenarios. Statistics provides the
mathematical foundation for understanding uncertainty, making inferences, and validating
model performance. This statistical grounding distinguishes statistical ML from purely
algorithmic approaches by emphasizing probabilistic reasoning and principled model
selection.
, 2. Probability and Statistics Fundamentals
Probability Theory Basics
Probability theory provides the mathematical framework for reasoning under uncertainty,
which is fundamental to statistical machine learning.
Sample Spaces and Events
Sample Space (Ω): Set of all possible outcomes of an experiment
Event (A): Subset of the sample space
Probability (P): Function that assigns real numbers to events
Axioms of Probability
For any events A and B:
At
1. Non-negativity: P(A) ≥ 0
ee
2. Normalization: P(Ω) = 1
3. Additivity: If A ∩ B = ∅, then P(A ∪ B) = P(A) + P(B)
qa
Conditional Probability and Independence
Kh
Conditional Probability: P(A|B) = P(A ∩ B) / P(B), provided P(B) > 0
Independence: Events A and B are independent if P(A ∩ B) = P(A) × P(B)
ad
Bayes' Theorem: P(A|B) = P(B|A) × P(A) / P(B)
am
This is fundamental to Bayesian machine learning approaches.
Random Variables and Distributions
Random Variables
A random variable X is a function that maps outcomes in the sample space to real numbers.
Types:
1. Discrete: Takes countable values (e.g., number of coin flips)
2. Continuous: Takes uncountable values (e.g., height, weight)
Probability Distributions
For Discrete Random Variables:
Probability Mass Function (PMF): P(X = x)