Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4,6 TrustPilot
logo-home
Class notes

Stanford CS229 Notes - Regression Algorithms

Rating
-
Sold
-
Pages
12
Uploaded on
02-01-2025
Written in
2024/2025

1. Introduction to Linear Regression and Gradient Descent Purpose: Introduces linear regression as a foundational supervised learning algorithm. Content Highlights: Explanation of hypothesis formulation. Detailed notation and definitions (parameters, input vectors, target variables). Step-by-step derivation of cost function

Show more Read less
Institution
Course

Content preview

Stanford CS229: Machine Learning
Amrit Kandasamy
November 2024


1 Linear Regression and Gradient Descent
Lecture Note Slides


1.1 Notation and Definitions
Pn
Linear Regression Hypothesis: hθ (x) = i=0 θi xi , where x0 = 1.
 
θ0
 .. 
θ= . 
θn
is called the parameters of the learning algorithm. The algorithm’s job is to
choose θ.
 
x0
 .. 
x= . 
xn
is an input vector (often the inputs are called features).
We let m be the number of training examples (elements in the training set).
y is the output, sometimes called the target variable.
(x, y) is one training example. We will use the notation

(x(i) , y (i) )

to denote the ith training example.
As used in the vectors and summation n is the number of features.
:= denotes assignment (usually of some variable or function). For example,
a := a + 1 increments a by 1.
We write hθ (x) as h(x) for convenience.




1

, Figure 1: Visual of Gradient Descent with Two Parameters


1.2 How to Choose Parameters θ
Choose θ such that h(x) ≈ y for the training examples. Generally, we want to
minimize
m
1X
J(θ) = (hθ (x(i) ) − y)2
2 i=1

In order to minimize J(θ), we will employ Batch Gradient Descent.
Let’s look an example with 2 parameters. Start with some point (θ0 , θ1 , J(θ)),
determined either randomly or by some condition. We look around all around
and think,

”What direction should we take a tiny step in to go downward as fast as possible?”.

If a different starting point was used, the resulting optimum minima would have
been changed (see the two paths above).

Now let’s formalize the gradient descent algorithm(s).


1.2.1 Batch Gradient Descent
Let α be the learning rate. Then the algorithm can be written as


θj := θj − α J(θ)
∂θj

Let’s derive the partial derivative part. Assume there’s only 1 training example
for now. Substituting our definition of J, we have
n
!
∂ ∂ 1 2 ∂ X
α J(θ) = (hθ (x) − y) = (hθ (x) − y) · ( θ i xi ) − y
∂θj ∂θj 2 ∂θj i=0


2

Written for

Institution
Course

Document information

Uploaded on
January 2, 2025
File latest updated on
January 2, 2025
Number of pages
12
Written in
2024/2025
Type
Class notes
Professor(s)
Unknown
Contains
All classes

Subjects

R269,02
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
tuningnumbers

Get to know the seller

Seller avatar
tuningnumbers stanford university
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
1 year
Number of followers
0
Documents
1
Last sold
-

0,0

0 reviews

5
0
4
0
3
0
2
0
1
0

Trending documents

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions