100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Discovering Statistics Using IBM SPSS Statistics Ch. 1-11 & 13 & 14 & 17 & 18

Rating
4.0
(7)
Sold
49
Pages
67
Uploaded on
05-10-2020
Written in
2017/2018

Extensive summary of the book: Discovering Statistics Using IBM SPSS Statistics by Andy Field. The summary includes chapter 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 13, 14, 17, and 18. It also includes several notes taken in class.

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Connected book

Written for

Institution
Study
Course

Document information

Summarized whole book?
No
Which chapters are summarized?
1-11
Uploaded on
October 5, 2020
Number of pages
67
Written in
2017/2018
Type
Summary

Subjects

Content preview

Discovering Statistics Using IBM SPSS Statistics
Chapter 1
Levels of measurement
Categorical (entities are divided into distinct categories):
- Nominal variable/categorical
• Binary (Only two values possible: Married, Pregnant, etc.)
• With more than two categories (e.g. whether someone is an omnivore, vegetarian, vegan, or
fruitarian)
- Ordinal variable: The same as a nominal variable but the categories have a logical order from
lower to higher, smaller to larger
-e.g. whether people got a fail, a pass, a merit or a distinction in their exam
-Answers to statements on a 5-point or 7-point scale are typically ordinal
Continuous (entities get a distinct score):
- Interval variable: Equal intervals on the variable represent equal differences in the property
being measured
-e.g. Temperature in degrees Celsius: the difference between 6 and 8 is the same as
the difference between 13 and 15
- Ratio variable: The same as an interval variable, but the ratios of scores on the scale must
also make sense (if you have 0 money in your pocket, it does not have any value so that
would be an interval variable. If the temperature is 0 degrees, it does mean something =
ratio)
-e.g. an income of 30000 dollars is twice as much as an income of 15000 dollars
➔ Often taken together as Interval-Ratio or Scale

Validity
Criterion validity = whether you can establish that an instrument measures what it claims to
measure through comparison to objective criteria
- Concurrent validity = when data are recorded simultaneously using the new instrument and
existing criteria
- Predictive validity = when data from the new instrument are used to predict observations at
a later point in time

Confounding variables/confounds = extraneous factors (external factors that cause things)

Chapter 2
The degree to which a statistical model represents the data collected is known as the fit of the
model. We are interested in finding results that apply to an entire population. This is often not
possible, therefore we collect data from a small subset of the population → sample
Scientists tend to describe data with linear models → models based upon a straight line, linear =
straight, non-linear = curved

We want to have a good fit! We look at four things:
- Normal distribution
- Homogeneity → the way that the nature of the data is
- Variance → nature is the same, so I can compare them
- Linearity → to be able to predict (formula) we need to have a linear relationship. If there is
no linear relationship, you will have a scatterplot → difficult to predict




1

,Populations and samples
• Population → all the things of interest; all the things we can measure
- The collection of units (be they people, plants, cities, etc.) to which we want to generalize a
set of findings or a statistical model
• Sample
- A smaller (but hopefully representative) collection of units from a population used to
determine truths about that population
• Random sample
- Is a sample drawn in such a way that each case in the population has the same chance of
being drawn into our sample (with sample we always mean a random sample unless stated
otherwise)
- We could use a numbered list of all the cases in the population (a sample frame) and use
random numbers to select some cases
- Most sampling methods that you find discussed in the literature (stratified sampling,
systematic sampling, etc.) are sampling methods that are used when sampling frames are not
available (or too expensive) and that we hope result in more or less random samples

Outcome i = (model) + error I
→ regression variable (singular regression/multiple regression)

Statistical models are made up of variables (measured that vary) and parameters → estimated from
the data (not measured) and are usually constant (e.g. mean)
- In statistics we fit models to our data (i.e. we use a statistical model to represent what is
happening in the real world)
- The mean is a hypothetical value (i.e. it doesn’t have to be a value that actually exists in the
data set) (e.g. the mean number of children that women have is 2.12)
- The mean is a simple statistical model

The mean
- The mean is the value from which the (squared) scores deviate least (it has the least error)
n

 xi
Mean : X = i =1
n

x : the value for case i
i

n : the number of cases
 : sum (add them all up)


The mean as a model


• The mean is a model of what happens in the real world: the typical score
• It is not a perfect representation of the data
• How can we assess how well the mean represents reality?

The perfect fit




2

,Calculating ‘Error’
• A deviation is the difference between the mean and an actual data point.
• Deviations can be calculated by taking each score and subtracting the mean from it:


• Total Error
- We could just take the error between the mean and the data and add them.

Sum of Squared Errors
• We could add the deviations to find out the total error.
• Deviations cancel out because some are positive and others negative.
• Therefore, we square each deviation.
• If we add these squared deviations we get the Sum of Squared Errors (SS).
• Although the SS is a good measure of the accuracy of our model, it depends on the amount
of data collected. To overcome this problem, we use the following formula, where
N is the sample size and df = N-1 the degrees of freedom:




• Sample → X = 10
• Population →  = 10

The sum of squared error and the mean squared error are used to assess the fit of a
model. When the model is the mean, the mean squared error is called variance and the square
root of the variance is called the standard deviation (p.49). The mean squared error is the sum of
squared errors divided by the number of degrees of freedom – in the case of the variance divided
by N-1

Variance and Standard Deviation
• We call the mean squared error the variance when the model is the mean.
• The square root of the variance is called the standard deviation
( )
n

 xi − x
2

SS
Variance = s = MSE = =
2 i =1
df n −1

( )
n

 xi − x
2


SD = s = =
2 i =1
s n −1


The Standard Error
• SD tells us how well the mean represents the sample data. The smaller the SD is, the better
the mean represents the sample data.
• But, if we want to estimate this parameter in the population, then we need to take into
account the SD of the population and the size of the sample that we used to estimate that
parameter: the larger the sample size, the more accurate our estimate.
When we want to compare means of samples, we tend to compare SE’s instead of SD’s




3

, To estimate the mean of the population to the left with a certain accuracy a much larger sample is
required than for the population to the right.

The standard error of a statistic (e.g. the mean) is the standard deviation of the
sampling distribution of that statistic. The standard deviation of the population mean measures
how well the population mean fits the individual cases in the population. The standard error of
the mean measures how well the sample mean fits the population mean

Samples vs. populations
• Sample
- Mean and SD describe only the sample from which they were calculated
• Population
- Mean and SD are intended to describe the entire population
• Sample to population:
- Mean and SD are obtained from a sample, but are used to estimate the mean and SD of the
population

Central Limit Theorem (0)
• The CLT tells us something important about how random samples behave.
• Suppose we drew many samples of a certain size (say n=20) from a given population and
calculated the mean of every sample. How would the frequency distribution of all these
sample means look like? We call this distribution the sampling distribution of the sample
means.

You should get a normal distribution. The larger the number of samples is, the more the graph will
represent the normal distribution, even though the population may not be normally distributed.

If a population has standard deviation σ from which we draw many samples of size N, then the
standard deviation of the sampling distribution of the sample mean


X =
N

Method of least squares → principle of minimizing the sum of squared error
Sampling variation → samples will vary because they contain different members of the population
Sampling distribution → frequency distribution of sample means from the same population
Standard deviation of sample means → standard error of the mean (SE) /standard error
Central limit theorem → as samples get large (greater than 30), the sampling distribution has a
normal distribution with a mean equal to the population mean
Confidence intervals → calculate boundaries within which we believe the population will fall

Confidence intervals




4
$6.62
Get access to the full document:
Purchased by 49 students

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Reviews from verified buyers

Showing all 7 reviews
3 year ago

3 year ago

4 year ago

I think it's a very good summary though, did I read wrong and it's not the chapter I was looking for.

4 year ago

4 year ago

5 year ago

5 year ago

4.0

7 reviews

5
2
4
4
3
0
2
1
1
0
Trustworthy reviews on Stuvia

All reviews are made by real Stuvia users after verified purchases.

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
jettejacobs Maastricht University
Follow You need to be logged in order to follow users or courses
Sold
52
Member since
5 year
Number of followers
52
Documents
0
Last sold
2 year ago

4.0

7 reviews

5
2
4
4
3
0
2
1
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions