100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary: Data Analysis

Rating
-
Sold
4
Pages
26
Uploaded on
29-01-2024
Written in
2023/2024

This is a summary of the data analysis course. It mainly consists of a summary of the subject matter of the relevant chapters in the book. This has been supplemented with information from the lectures and ~ 80% of the SPSS instructions that will be required.

Show more Read less
Institution
Module










Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Module

Document information

Summarized whole book?
No
Which chapters are summarized?
Hoofdstuk 5, 19 t/m 24
Uploaded on
January 29, 2024
Number of pages
26
Written in
2023/2024
Type
Summary

Subjects

Content preview

Chapter 5 – Pairs of Variables

5.1 : Scatter plot, covariance and correlation

Dependent variable (usually Y) = topic of investigation
Independent variable (usually X) = cause of variation

Scatterplot -> used to get a visual idea of the relationship between two (quantitative) variables by
displaying all the (x,y) pairs
Population/Sample cloud = all the dots resulting from the (x,y) pairs

Different possible relationships:
1. Positively linearly related -> best fitting straight line is increasing
2. Negatively linearly related -> best fitting straight line is decreasing
3. Quadratic relationship -> results seem to follow a mountain/valley based parabolic
4. Logarithmic relationship -> results variate wildly for low values but then seem to even out
5. No relationship

Covariance -> measures the degree of linear relationship between y and x
Formula:
N
1
Population covariance: σ X ,Y = ∑ (x −μ )( y −μ y )
N i =1 i x i
n
1
Sample covariance: s X ,Y = ∑ ( x −x )( y i− y )
n−1 i=1 i
-> the reason for ‘n-1’ instead of just ‘n’ in the sample covariance is that it is better at estimating the
population covariance
-> replacing all the y and Y by x and X will result in the formulas for the population variance and the
sample variance

Short cut formula:
N
1
Population covariance: σ X ,Y = ∑ x y −μ μ
N i =1 i i x y
1
Sample covariance: s X ,Y = ¿
n−1

Using the covariance has downsides. A reference point to determine whether the relationship is
strong is missing and the covariance is dependent on the dimensions of the variables

Correlation -> measures the degree of linear relationship between y and x but without the downsides
mentioned above
Formula:
σ X ,Y
Population correlation coefficient: ρ=ρ X , Y =
σ X σY
SX , Y
Sample correlation coefficient: r =r X , Y =
SX SY
-> value of both the coefficients is between (-1,1), where +1 indicates a strong positively linear

,relationship, -1 a strong negatively linear relationship and 0 no relationship (uncorrelated)

5.2 : Regression line

Regression of Y on X = the study of the dependence of Y on X
Least squares (LS) method :
1. Start with a general line with the equation: y = a + bx
2. Fill in the x and find out what values of a and b cause the least overall difference for the y values

Formulas:
S X, Y
Sample regression coefficients: b = 2 and a = y−b x
SX
S X, Y
Population regression coefficients: β 1= 2 and β 0=μ y −β 1 μ x
SX

Sample regression line: ^y =b0 +b1 x (also called: prediction line)
Population regression line: y=β 0 + β 1 x
-> b0/ꟗ0 = the intercept
-> b1/ꟗ1 = the slope
-> sample regression line passes through ( x , y )
-> population regression line passes through ( μ x , μ y )

Interpolation = if a new ‘x’ value is within the range of existing ‘x’ values, predictions can be trusted
Extrapolation = if a new ‘x’ value is outside the range of existing ‘x’ values, predictions can’t be
trusted

Residuals/Errors = the difference between the y-values and the regression line
-> shows the concentration of y-values around the regression line
-> the sum of residuals will always be 0 (otherwise the regression line is not the best fitting line)

Formulas:
Residual/Error: e i= y i− ^yi
n n
Sum of squared errors: SSE=∑ ( y i− ^y i ) =∑ e i
2 2

i=1 i=1
-> the smaller the SSE, the better the predicting performance of the regression line

5.3 : Linear transformations

Transforming a variable ‘X’ can be done using the formula: Y = a + bX
-> this has implications for certain statistics, summarised below:


Population dataset Sample dataset
Location μ y =a+b μ x y=a+b x
μ ymedian =a+b μ xmedian y median =a+ b x median
2 2 2 2 2 2
Variation σ Y =b σ X sY =b s X
σ Y =|b|σ X sY =|b|s X

, Transforming both variables ‘X’ and ‘Y’ can be using two formulas: V = a + bX and W = c + dY
-> this has implications for certain statistics, summarised below:


Population dataset Sample dataset
Covariance σ V ,W =bd σ X ,Y sV , W =bd s X ,Y
Correlation coefficient If bd >0 : ρV , W = ρX ,Y r V ,W =r X , Y
If bd <0 : ρV , W =− ρX , Y r V ,W =−r X , Y

5.4 : Relationship between two qualitative variables

Covariance and correlation coefficient are useless when comparing two qualitative variables
-> instead we use contingency/cross-classification tables, they give the joint frequencies of the data

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
joesvanderstok Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
132
Member since
2 year
Number of followers
37
Documents
32
Last sold
5 days ago

4.0

9 reviews

5
2
4
5
3
2
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these revision notes.

Didn't get what you expected? Choose another document

No problem! You can straightaway pick a different document that better suits what you're after.

Pay as you like, start learning straight away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and smashed it. It really can be that simple.”

Alisha Student

Frequently asked questions