100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
College aantekeningen

MAT-15303 Lectures Statistics 1

Beoordeling
-
Verkocht
2
Pagina's
30
Geüpload op
09-09-2021
Geschreven in
2020/2021

Lecture summary of the course Statistics 1 (MAT) at Wageningen University (WUR). Slides included as examples to give an extensive overview. Combination of Dutch and English.











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
9 september 2021
Aantal pagina's
30
Geschreven in
2020/2021
Type
College aantekeningen
Docent(en)
Boer
Bevat
Alle colleges

Onderwerpen

Voorbeeld van de inhoud

Statistics 1
Tutorial 1 – population, sample, variables,
frequency tables
Smoking during pregnancy can cause problems for mother and child. Adverse outcomes like underweight.

RQ: Question that we want to answer
Example: Do children whose mothers smoked during pregnancy often have a lower birth weight?

Population: every member of a group (persons, objects, etc.) for which we would like to collect information.
Example: all pregnant women in the Netherlands in 2017

Sample: part of the population that we will study and collect information for. Often too expensive or time
consuming to study whole population, so we draw a sample.
Example: selection of pregnant women in the Netherlands in 2017


Random selection procedure: representative of the whole population

Units: the elements of a sample from which we collect the information.
Example: pregnant women

Variable: measured property of an element of the sample. Generally things we measure e.g. height, weight, hair
colour.
Examples: weight at birth of baby, education level of pregnant women

Quantitative variable: (continuous/discrete)
- Height, weight at birth, yield (continuous)
- Number of children in household, number of diseased plants in a field, number of cigarettes each day
for pregnant women (discrete) You can’t have halves, they are wholes.

Qualitative variable: (nominal/ordinal)
- Hair colour, bachelor program, province, place of residence (nominal) Something you measure, but you can’t
put an order in it/rank them
- Grade of eggs, highest level of education completed, annual salary (ordinal) When you can rank them: ordinal

Exercises:
1.1: Height (continuous), weight (continuous), eye colour (nominal), sex (nominal), hair colour (nominal), number of siblings
(discrete), head circumference (continuous). Babies are the units.

1.2: A: Units: full-grown cows of a certain breed. Quantitative variables: weight
B: Units: new born twins. Quantitative: height, weight. Qualitative: sex

Drawing a sample from a population
We want to draw conclusions about the population, so sample should be representative for the population.

Sampling bias: certain parts of the population might be overrepresented as compared to other parts. Example:
polls US election. Obama competing for presidency. Calling people and asking people who they vote. Polling stations got it wrong, they
contacted people through home phones, which 23% didn’t have. Research was done on people with landline, these were more old,
wealthy and conservative so they vote other party. Therefore: sampling bias through the landline.


Recommended sampling methods:
1. Simple Random Sampling (SRS): units drawn at random from population. Every unit in population has
the same probability to end up in your sample. Example: drawing 4/20 business cards from a box  lottery system.
Sampling bias avoided by offering everyone equal chance and probability.

Ground rule: every sample should have the same chance of ending up on your sample.

,Exam question: 1) not random sample, as not all crates have the
same probability of apples to be picked. 2) still not a simple
random sample. Both not simple random samples.

Every sample of a certain size has to have the same chance to end
up in sample. Every object/subject in population must have same
chance to end up in the sample. Not the case? Not SRS.

Other things that could go wrong in sampling:
1. Undersampling: certain groups are excluded from the
sample, e.g. all women that did not give birth in hospital, due to a received hospital list of women giving birth in hospital
2. Non-response: not participating, or not successfully contacted
3. Voluntary participation (in survey): might result in particularly positive/negative answers. Survey
received in restaurant: people who are very positive or very dissatisfied are more likely to participate than average people.
4. Response bias: social desirability bias (self-reported personal traits, questions about income, mental health, alcohol)

Observational research
Observational: observe the unit/process without influencing it (looking, feeling, etc.)
Example: consequences of smoking during pregnancy is an observational study. You can’t do an experiment with that.

You can’t theoretically draw hard cause-effect conclusions. The effect of smoking can also be due to confound
(external effects).

Experimental research
Experimental: apply a treatment to the unit in order to observe a reaction.
Example: randomization of 20 experimental plots, you assign a wheat variety to 10 of the plots and another to the other 10. Then you
determine what the difference is.

Cause-effect relationship can only be concluded from an experimental study: here you change only what you
want to investigate, the others factors are the same. This gives you opportunity to conclude a causal effect.

Exercises:
1.3: A: households with welfare support in a particular city
B: 400 households from that city with welfare support
C: Welfare support, number of children, living in the city
D: Observational study

Frequencies:
Frequency is how often something occurs, can be women in the pregnancy example.

Also applicable to discrete variables with a limited number of outcomes.

In comparison, it is better to determine the relative frequency. Dividing frequency by the total number
available. Gives you a fraction, e.g.: 172/945 = 0,18 fraction. So percentage of 18% of women that took primary
education.

, Tutorial 2 – numerical summary of data &
probability
PART A: Numerical data
Eating too much salt is not healthy, important to know how much salt is in our food. If you want to avoid too
much salts, we need to know the amounts that are in our foods. Therefore: investigating salt in bread.
Population: all loaves of bread sold in the NL (on one particular day)
Units: bread loaves
Variable: amount of salt (g/100g) in bread
Sampling design: 1) Simple random sample from all supermarkets and bakeries. 2) draw one loaf of bread randomly from each of the
selected supermarkets/bakeries. Two stage cluster sample, not SRS.

Note: know for the exam whether or not a certain sample is a SRS or not. You don’t need to remember the
specific study designs.

Central tendencies (data): use the mean or median:
Mean: calculate average: alle nummers optellen en delen door aantal nummers. Streepje indicates mean.
Median: middle/mid-point value if you order all observations from small to large. 50% of observations will be smaller,
50% of observations will be larger. Also called 50th percentile.
With an equal number, you get the two middle values and calculate the average, e.g. 5,5.

Symmetric distribution: difference of the mean and median is very little.
Asymmetric distribution: difference of mean and median is large.
Due to effect of outliers: Median is not sensitive to outliers, the mean is very sensitive to outliers (uitschieters).

Measures of variability:
How are your observations distributed around the mean? Indication of the spread of the data through
measures: standard deviation & range.

Standard deviation: is the √(square root) of the variance.  BOEK LEREN.

Variance: look at differences between observations and the mean, square² this
to add up all differences. Otherwise the positive and negative difference will cancel each other
out. The squared differences are added up and divided by n-1.




If you want to take the standard deviation, you take the squared route of the
variance. And if you want to calculate the variance through the SD, take the
standard deviation and square² it.

Interquartile range IQR = Q3 – Q1
Put all observations in order from low to high.
Q1 = 1st quarter of the data = 25 th percentile = lower
quartile
Q3 = 3rd quartile = 75th percentile = upper quartile
Q2 = idem, but called the median.

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
Nerine Wageningen University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
75
Lid sinds
9 jaar
Aantal volgers
65
Documenten
4
Laatst verkocht
2 maanden geleden

3,9

12 beoordelingen

5
3
4
6
3
2
2
1
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen