100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Descriptive Statistics Lecture 4 (H3.3 & 3.4)

Rating
-
Sold
-
Pages
6
Uploaded on
12-02-2023
Written in
2022/2023

This is a summary for the subject matter of lecture 4 of Descriptive Statistics in the pre-master Orthopedagogy at the University of Amsterdam. It covers chapters 3.3 and 3.4 of Algresti & Franklin (Statistics).

Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
H3.3&3.4
Uploaded on
February 12, 2023
Number of pages
6
Written in
2022/2023
Type
Summary

Subjects

Content preview

3.3. Predicting the outcome of a variable
Exploring the relationship between 2 quantitative variables graphically  scatterplot

Straight-line pattern?  correlation coefficient describes its strength numerically

Further analysis  finding an equation for the straight line that best describes that pattern

This equation can be used to predict the value of the variable designated as the response variable
from the value of the variable designated as the explanatory variable.

Regression line = predicts the value for the response variable y as a straight-line function of the
value x of the explanatory variable. Let ^y denote the predicted value of y.

- The equation for the regression line has
the form: ^y =a+bx
- a denotes the y-intercept and b denotes
the slope.



y-intercept = the predicted value of y when x = 0

slope = equals the amount that ^y changes when
x increases by one unit.

- For two x values that differ by 1.0, the ^y
values differ by b.




When the slope is negative  ^y decreases as x
increases. The straight line then goes downward,
and the association is negative.

When the slope = 0, the regression line is
horizontal (parallel to the x-axis). ^y stays constant
at the y-intercept for any value of x. ^y does not
change as x changes and the variables don’t




exhibit association.

, The absolute value of the slope describes the magnitude of the change in ^y for a 1-unit change in x.
The larger the absolute value, the steeper the regression line.

Prediction error / residuals = difference between the actual y value and the predicted y value.
Residual = y− ^y

Each observation has a residual

A positive residual occurs when the actual y is larger than ^y , so that y− ^y > 0

A negative residual results when the actual y is smaller than ^y , so that y− ^y < 0

The smaller the absolute value of the residual, the closer the predicted value is to the actual value,
so the better the prediction.

If the predicted value is the same as the actual value, the residual is zero: y− ^y =0

In a scatterplot, the vertical distance between the point and the regression line is the absolute value
of the residual.



How is the equation for the regression line found?

The actual summary measure used to evaluate regression lines is called the residual sum of squares
residual ∑ of squares=Σ( residual)2=Σ( y− ^y )2
This formula squares each vertical distance between a point and the line and then adds up these
squared values. The better the line, the smaller the residuals tend to be, and the smaller the residual
sum of squares tends to be.

For each potential line, we have a set of predicted values, a set of residuals and a residual sum of
squares. The line that the software reports is the one having the smallest residual sum of squares.
This is why selecting a line is called the least squares method.

This regression line:

- Makes the errors as small as possible
- Has some positive residuals and some negative residuals, and the sum (and mean) of the
residuals equals 0
o Too-high predictions are balanced by too-low predictions
- Passes through the point ( x , y )
o The center of the data

sy
Formula for slope is b=r ( )
sx

Formula for y-intercept is a= y−b( x)

The slope b is directly related to the correlation r and the y-intercept depends on the slope.



We’ve used correlation to describe the strength of the association.
$4.17
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
sevendeboer Universiteit van Amsterdam
Follow You need to be logged in order to follow users or courses
Sold
17
Member since
2 year
Number of followers
13
Documents
17
Last sold
3 months ago

5.0

2 reviews

5
2
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions