- Lectures 0 – 9.
- Lecture 10 is the summary lecture and the notes are part of/included in lecture 0 -9.
- Seminar notes.
Note: only the decision tree is not written in this summary.
,Statistics 1 – Lecture notes + literature
Lecture 0 and 1:
Population → The group that you wish to describe (firms or people).
- The entire set of elements.
Sample → The group for which you have data (=limited).
- A subset of elements from the population, taken with the intention of
making inferences (=gevolgtrekkingen) about the population.
Why taking a sample? Too expensive, impossible, not sampling might be
destructive (example in physical geography) or impractical.
→Representative= make sense to a wider group=the population.
Parameter= numerical property of the population.
“We don’t know” How close is it towards each other?
Statistic= numerical property of a sample. = Representative?
“We know/we collected”.
= sampling error = a difference/ uncertainly arise between the value of a parameter and the statistic
computed to estimate that parameter. We are not 100% sure between those values. Result of:
- Variability (change).
- Sampling Bias.
- Nonsampling Error: Due to mistakes in the research process. Example: Use of wrong codes.
Reducing sampling error?
➢ Variability→ By increasing N (=size).
➢ Sampling Bias→ By design a sampling procedure.
➢ Nonsampling Error→ By:
- Validity, accuracy, precision of variables.
- Prevent coding errors.
- Prevent interpretation errors.
- Good labelling and metadata (use of R).
,Important concepts:
- Variability= repeated sampling form the same population results in different values for the
statistic. Less variability= more reliable= more inference to make conclusions.
- Sampling distribution= describes how the statistic varies when sampling is repeated.
In other words: Describes (extent of) variability.
= basis for inference (=gevolgtrekkingen)! →How good is the sample that it says something
about a population.
But: We can’t fully generalize sampling: Central Limit Theorem
→There is a difference between what we want and what we exactly do
- We may assume that…
- Under certain conditions….
- Such as a large number of cases and a fixed standard deviation
➔ The sampling distribution of the mean is approximately normal with standard error=
Normal distribution: Average/most values.
Less chance of values but not
impossible.
Sampling Bias= result of procedures which favour the inclusion, in your sample, of elements from the
population with certain characteristics. Different kinds or combination of:
- Population.
- Researcher: Personality, design and topic.
- Respondent.
May result in:
- Incomplete coverage: Relevant elements are not in the sampling frame.
- Nonresponse: Refusal or missing data.
Solution to reduce sampling bias= the 5 steps in the sampling process.
1= Define the population including time frame and geographical limit.
2= Ordered list of the individuals in a population.
- Include all individuals.
- Each individual element should only appear once.
Target population= set of all individual relevant to the study.
Sampled population= all the individuals listed in the sampling frame.
3= Procedure uses to select individuals from the sampling frame for the sample.
4= Making a pilot-sample/pretest to test the data collection procedures in advance. To check.
5= To minimize nonsampling error.
, Different types of samples:
- Probability samples: You don’t know/you have no control.
1. Simple random= you need access to everyone and pick up randomly and leave for the
sample. There’s an equal probability of being select.
2. Independent random= simple random with replacement. The one you pick up randomly
does not leave. Used in small populations.
3. Systematic= there is a rule involved. Example: Every 10th person is in the sample.
4. Stratified= divide into groups based on differences before sampling. To control the
sampling process, reduce sampling error and decrease likelihood of unrepresentative
samples. Example: Male-Female.
5. Cluster= divide into groups features. Each person in the cluster is almost the same.
- Nonprobability samples: You (researcher) choose who is in your sample.
Uses in de qualitative research since you need “special” people/perspectives.
1. Judgemental or purposeful= Personal judgement is used to decide who is in the sample.
These are the people who best serve the purpose of the sample according to you.
2. Convenience or accessibility= Only convenient or accessible members of the population.
3. Quota= specific categories/subgroups to obtain a representative sample of the
population. Based on bias.
4. Volunteer= individuals who self-select from the population. But: more motivated.
Geographic sampling: Based on space/spatial.
a. Traverse samples→ random lines through the map. Ony data on the line is in the sample.
b. Quadrat samples→ random little boxes are on the map. Only data in these boxes is in the
sample.
c. Point samples→ random points on the map. Only these datapoints are in the sample.
The categories can be divided into:
a. Random.
b. Systematic.
c. Stratified systematic within stratum(=lagen).
d. Stratified, random orientation.