100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Class notes

Stanford CS229 Notes - Regression Algorithms

Rating
-
Sold
-
Pages
12
Uploaded on
02-01-2025
Written in
2024/2025

1. Introduction to Linear Regression and Gradient Descent Purpose: Introduces linear regression as a foundational supervised learning algorithm. Content Highlights: Explanation of hypothesis formulation. Detailed notation and definitions (parameters, input vectors, target variables). Step-by-step derivation of cost function

Show more Read less









Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
January 2, 2025
File latest updated on
January 2, 2025
Number of pages
12
Written in
2024/2025
Type
Class notes
Professor(s)
Unknown
Contains
All classes

Subjects

Content preview

Stanford CS229: Machine Learning
Amrit Kandasamy
November 2024


1 Linear Regression and Gradient Descent
Lecture Note Slides


1.1 Notation and Definitions
Pn
Linear Regression Hypothesis: hθ (x) = i=0 θi xi , where x0 = 1.
 
θ0
 .. 
θ= . 
θn
is called the parameters of the learning algorithm. The algorithm’s job is to
choose θ.
 
x0
 .. 
x= . 
xn
is an input vector (often the inputs are called features).
We let m be the number of training examples (elements in the training set).
y is the output, sometimes called the target variable.
(x, y) is one training example. We will use the notation

(x(i) , y (i) )

to denote the ith training example.
As used in the vectors and summation n is the number of features.
:= denotes assignment (usually of some variable or function). For example,
a := a + 1 increments a by 1.
We write hθ (x) as h(x) for convenience.




1

, Figure 1: Visual of Gradient Descent with Two Parameters


1.2 How to Choose Parameters θ
Choose θ such that h(x) ≈ y for the training examples. Generally, we want to
minimize
m
1X
J(θ) = (hθ (x(i) ) − y)2
2 i=1

In order to minimize J(θ), we will employ Batch Gradient Descent.
Let’s look an example with 2 parameters. Start with some point (θ0 , θ1 , J(θ)),
determined either randomly or by some condition. We look around all around
and think,

”What direction should we take a tiny step in to go downward as fast as possible?”.

If a different starting point was used, the resulting optimum minima would have
been changed (see the two paths above).

Now let’s formalize the gradient descent algorithm(s).


1.2.1 Batch Gradient Descent
Let α be the learning rate. Then the algorithm can be written as


θj := θj − α J(θ)
∂θj

Let’s derive the partial derivative part. Assume there’s only 1 training example
for now. Substituting our definition of J, we have
n
!
∂ ∂ 1 2 ∂ X
α J(θ) = (hθ (x) − y) = (hθ (x) − y) · ( θ i xi ) − y
∂θj ∂θj 2 ∂θj i=0


2
$14.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
tuningnumbers

Get to know the seller

Seller avatar
tuningnumbers stanford university
View profile
Follow You need to be logged in order to follow users or courses
Sold
0
Member since
11 months
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions