100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

4.4C Applied Multivariate Data Analysis Samenvatting Field boek

Beoordeling
3,5
(2)
Verkocht
22
Pagina's
52
Geüpload op
23-01-2022
Geschreven in
2021/2022

Samenvatting van het boek van Field voor het vak 4.4C Applied Multivariate Data Analysis. De samenvatting omvat de hoofdstukken 2,3,6,8,9,11, 12, 13, 14, 15, 16, en 17.












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Heel boek samengevat?
Nee
Wat is er van het boek samengevat?
Hoofdstuk 2,3,6,8,9, 11 t/m 17
Geüpload op
23 januari 2022
Aantal pagina's
52
Geschreven in
2021/2022
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Field Book – Ch. 2,3,6,8,9,11, 12, 13, 14, 15, 16, 17

Chapter 2: The Spine of Statistics

2.1 What will this chapter tell me?
How we can use the properties of data to go beyond our observations and draw inferences
about the world at large.

2.2 What is the SPINE of statistics?
Standard error
Parameters
Interval estimates (Confidence intervals)
Null hypothesis significance testing
Estimation

2.3 Statistical models
Scientists build (statistical) models of real-world processes to predict how these processes
operate under certain conditions. The degree to which a statistical model represents the
data collected is known as the fit of the model.

Outcome = Model + Error
This means that the data we observe can be predicted from the model we choose to fit plus
some amount of error.

2.4 Populations and Samples
Scientists are usually interested in finding results that apply to an entire population of
entities. We rarely have access to every member of a population. Therefore, we collect data
from a smaller subset of the population known as a sample.

2.5 P is for Parameters
Statistical models are made up of variables and parameters. Variables are measured
constructs that vary across entities in the sample. In contract, parameters are not measured
and are (usually) constants believed to represent some fundamental truth about the
relations between variables in the model.
We can predict values of an outcome variable based on a model. The form of the model
changes, but there will always be some error in prediction, and there will always be
parameters that tell us about the shape or form of the model.

2.5.1 The mean as a statistical model
The mean value is a hypothetical value: it is a model created to summarize the data and
there will be error in prediction. Hats on equations  means they are estimates.

2.5.2 Assessing the fit of a model: sums of squares and variance revisited
The error or deviance for a particular entity is the score predicted by the model for that
entity subtracted from the corresponding observed score.

,The sum of squares (SS) can be used to assess the total error in any model (Add the squared
particular errors). To estimate the mean squared error (also known as variance) in the in the
population we need to divide the SS by the degrees of freedom (df: n-1) (SS/df).
We can use the sum of squared errors and the mean squared error to assess the fit of a
model.

Degrees of freedom relate to the number of observations that are free to vary.

2.6 E is for estimating parameters
The equation for the mean is designed to estimate that parameter to minimize the error.
That doesn’t necessarily mean that the value is a good fit to the data, but it is a better fit
than any other value you might have chosen.
This section has focused on the principle of minimizing the sum of squared errors, and this is
known as the method of least squares or ordinary least squares OLS. However, there are
other estimation methods as well.

2.7 S is for standard error
To go beyond the data we need to look at how representative our samples are of the
population of interest. The population mean, μ, is the parameter we’re trying to estimate.
But since we don’t have access to the whole population we use a sample, of which we get
the sample mean. If we take multiple samples we get different means, this illustrates the
sampling variation. Since the samples contain different members of the population they
vary.

A sampling distribution is the frequency distribution of sample means (or whatever
parameter you’re trying to estimate) from the same population. If we would have thousands
of samples (unicorn idea), the average of all the samples would be the population mean. The
standard deviation would tell us how widely sample means spread around the population
mean, so how representative of the population a sample mean is likely to be. The standard
deviation of sample means is known as the standard error of the mean (SE) or standard
error for short. This would be calculated by taking the difference between each sample
mean and the overall mean, squaring those differences, adding them up, and then dividing
by the number of samples. Finally, the square root of this value would need to be taken. 
we don’t take that many samples. Central limit theorem tells us that as samples get large
(>30, smaller gets t-distribution), the sampling distribution has a normal distribution with a
s
mean equal to the population mean and a standard deviation of σ X =
√N

2.8 I is for (confidence) interval
We can use the estimated parameters and standard error to calculate boundaries within
which we believe the population value will fall, called confidence intervals.

2.8.1 Calculating confidence intervals
Rather than fixating on a single value from the sample (the point estimate), we could use an
interval estimate instead: we use our sample value as the midpoint but set a lower and
upper limit as well. Typically, we look at 95% confidence intervals: they are limits
constructed such that, for a certain percentage of samples (here 95%), the true value of the

,population parameter falls within the limits. To calculate the confidence interval, we need
to know the limits within which 95% if sample means will fall.

Lower boundary of confidence interval = X −(1.96 × SE)
Upper boundary of confidence interval = X +(1.96 × SE)

2.8.2 Calculating other confidence intervals
Sometimes we want to calculate other types of confidence intervals such as 99% or 90%.
(1−0.95)
Then you need to find the z-value: = 0.025  Look up in the table, z = 1.96. For
2
other values, you can replace 1.96 in the formula by the new z-value.

2.8.3 Calculating confidence intervals in small samples
For smaller samples, you have a t-distribution. So to construct a confidence interval in a
small sample we use the same principle as before, but instead of the value for z we use the
value for t.
Lower boundary of confidence interval = X −( t n−1 × SE )
Upper boundary of confidence interval = X + ( t n−1 × SE )

2.8.4 Showing confidence intervals visually
The confidence interval is usually displayed using something called an error bar, which looks
like the letter ‘I’. If the bars of any two means do not overlap then we can infer that these
means are from different populations, they are significantly different.

2.9 N is for null hypothesis significance testing
2.9.1 Fisher’s p-value
Only when there is a 5% chance (or 0.05 probability) of getting the result we have (or more
extreme) if no effect exists are we confident enough to accept that the effect is genuine.
Fisher’s basic point was that you should calculate the probability of an event and evaluate
this probability within the research context.

2.9.2 Types of hypothesis
In contrast to Fisher, Neyman and Pearson believed that scientific statements should be split
into testable hypotheses. The hypothesis or prediction from your theory would normally be
that an effect will be present, the alternative hypothesis (H1, sometimes called
experimental hypothesis). The null hypothesis (H0) is the opposite of the alternative
hypothesis and usually states that an effect is absent. The null hypothesis is useful because
it gives us a baseline against which to evaluate how plausible our alternative hypothesis is.
We can talk only in terms of the probability obtaining a particular result or statistic if,
hypothetically speaking, the null hypothesis were true.
Hypothesis can be directional or non-directional. A directional hypothesis states than an
effect will occur, but also states the direction of the effect (less chocolate, one-tailed). A
non-directional hypothesis states that an effect will occur, but not the direction (amount of
chocolate).

, 2.9.3 The process of NHST
NHST is a blend of Fisher’s idea of using the probability value p as an index of the weight of
evidence against a null hypothesis, and Neyman and Pearson’s idea of testing a null
hypothesis against an alternative hypothesis.




2.9.4 Test statistics
Systematic variation is variation that can be explained by the model we’ve fitted to the data.
Unsystematic variation is not attributable to the effect we’re investigating and cannot be
explained by the model we’ve fitted. The simplest way to test whether the model fits the
data, or whether our hypothesis is a good explanation of the data we have observed, is to
compare the systematic variation against the unsystematic variation.
signal variance explained by model ¿ parameter effect
Test statistic= = = =
noise variance not explained by model sampling variation∈the parameter error
The exact form of the calculation changes depending on which test statistic you’re
calculating.

Beoordelingen van geverifieerde kopers

Alle 2 reviews worden weergegeven
8 maanden geleden

1 jaar geleden

3,5

2 beoordelingen

5
0
4
1
3
1
2
0
1
0
Betrouwbare reviews op Stuvia

Alle beoordelingen zijn geschreven door echte Stuvia-gebruikers na geverifieerde aankopen.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
KenzaS Universiteit Utrecht
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
202
Lid sinds
9 jaar
Aantal volgers
128
Documenten
10
Laatst verkocht
10 maanden geleden

4,0

46 beoordelingen

5
17
4
18
3
9
2
0
1
2

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen