100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Statistical Methods - Slides Summary

Beoordeling
-
Verkocht
-
Pagina's
65
Geüpload op
03-01-2025
Geschreven in
2020/2021

A summary of all the slides for the course Statistical Methods, BSc AI.












Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
3 januari 2025
Aantal pagina's
65
Geschreven in
2020/2021
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Statistical Methods - Summary

Lecture 1
● Statistics: science of data, the study of collecting, organizing, analyzing, interpreting and
presenting data.
○ Statistics are used to gain information about a group of objects (population)
and/or to make decisions and predictions when randomness is involved.
● Census: collection of data from every member of a population.
○ Usually too large to collect
○ Therefore, a sample, a selected subcollection (or subset) from the population is
studied.
■ A different sample results in different data. Hence, possibly different
conclusions about the population. A sample should be representative
(same characteristics as population) and unbiased (no systematic
difference with population)
○ Sample → Data → Analysis → Conclusion about population

1.2 Statistical and critical thinking
● A statistical study consists of the following steps:
1. Prepare
a. Context
b. Source
c. Sampling method (how to obtain samples?)
2. Analyse
a. Graph data
b. Explore data
c. Apply statistical methods
3. Conclude

1.4 Collecting sample data:
● There are different methods to collect sample data
○ Voluntary response sample: subjects decide themselves to be included in the
sample.
○ Random sample: each member of the population has equal probability of being
selected.
○ Simple random sample: each sample of size n has equal probability of being
chosen.
○ Systematic sampling: after starting point, select every k-th member.
○ Convenience sampling: easily available results.
○ Stratified sampling: divide population into subgroups (strata) such that subjects
within groups have the same characteristics, then draw a (simple) random sample
from each group.



1

,Statistical Methods - Summary


○ Cluster sampling: Divide population into sections (clusters), then randomly
select some of these clusters.
● Important concepts:
○ Variable: quantity that may vary
● In cause and effect studies:
○ Explanatory (independent) variable: variable which might cause the effect
being studied.
○ Response (dependent) variable: variable that represents the effect being studied.
○ Confounding: occurs when influences of different explanatory variables on
response variable mix and can not be distinguished anymore.
● Different types of study:
○ Observational study: characteristics of subjects are observed, but subjects are
not modified.
■ Retrospective (case-control): data from the past
■ Cross-sectional: data from one point in time
■ Prospective (longitudinal): data to be collected
○ Experiment: some treatment is applied to subjects.
■ Sometimes control and treatment group: single-blind and double-blind.
■ Placebo effect, experimenter effect.

1.3 Types of data
● Parameter: numerical measurement describing some characteristic of a population.
○ Notation: typically Greek symbols, e.g. μ, σ,....
● Statistic: numerical measurement describing some characteristic of a sample.
○ Notation: small letters, e.g. ̄x, s.
● Data is not only numbers
○ Quantitative (numerical) data: numbers representing counts or measurements
■ E.g., number of students’ siblings: 1, 0, 2, 2, 5...
○ Qualitative (categorical) data: names or labels (“1”, not 1) representing counts
or measurements
■ E.g., quality of a course: good/far/bad
● Quantitative data:
○ Discrete data: number of possible values is “countable”
■ E.g., word counts, number of coin tosses
○ Continuous data: collection of values is not countable
■ E.g., length, weight, distance
● Level of measurement of data is used to determine which statistical methods might apply
to the data.




2

,Statistical Methods - Summary


○ Qualitative data:
■ Nominal: names, labels, categories (no ordering).
● E.g. gender, eye color. Can not be used for computations.
■ Ordinal: categories with ordering, but no (meaningful) differences.
● E.g. U.S. grades (A-F), opinions (totally disagree / disagree / . . . /
totally agree)
○ Quantitative data:
■ Interval: ordering possible and differences between numbers are
meaningful, but there is no natural zero starting point.
● E.g. year of birth, temperatures (Celsius/Fahrenheit).
■ Ratio: ordering possible, differences are meaningful and there is a natural
starting point.
● E.g. body length, marathon times
● Determine the level of measurement for the following data:
○ M&M colours = nominal data (qualitative, no ordering)
○ Inauguration years of U.S. presidents = interval data (quantitative, no natural
starting point)
○ Brain volumes (in cm3) = ratio data (quantitative, natural starting point)
○ Level of lead in blood (low/medium/high) = ordinal data (qualitative, ordering)

Summarizing and graphing data
● From now on,we assume that data are from a representative and unbiased sample.
● Next: summarize data
○ Numerical summary
○ Graphical summary
● Every data set comes with a research question. Use your summary to answer your
research question.
● Typically we are interested in the data distribution — where does the data lie?
● Good summary shows:
○ what the data distribution looks like: location, spread/dispersion, range,extremes,
accumulations, gaps/holes, symmetry, . . .
● Depending on context and goal, also whether:
○ data could be sampled from a certain distribution
○ data is rounded
○ different groups are needed for further analysis
○ there are influences of other variables, e.g. time
○ there is dependence between variables.
● Summarise to describe or find structure in data distribution:
○ Graphical: tables, graphs, other figures of data distribution




3

, Statistical Methods - Summary


○ Descriptive
■ Qualitative: describe shape, location and dispersion/variation of data
distribution
■ Quantitative: numerical summaries of location and variation
○ NB: first step in every data analysis: make some figures of data (if possible) for
own use. Could prevent wrong choice of statistical methods.

Graphical summaries
→ Some of these summaries can only be used for some types of data.
● Frequency distribution (table)
○ Count occurrences of category or number of values in interval
○ freq=cbind(table(grades2[,2]))
freq=cbind(freq[,1],cumsum(freq[,1]),freq[,1]/length(grades),cumsum(freq[,1])/length(grades))
colnames(freq)=c("Frequency","Cumulative","Rel. frequency","Cum. rel. frequency")
options(digits=2)
print(freq)




● Bar chart
○ population=c(322,1372,147,127,65,81,1278,36,407,1111)
names(population)=c("US", "Chi", "Rus", "Jap", "GB",
"Ger", "Ind", "Can", "SAm","Afr")
par(mfrow=c(1,1))
barplot(population,main="Bar chart", ylab="Pop. size (mln)",col="red")




● Pareto bar chart
○ orders the categories with respect to frequency. Only applies to data of nominal
level of measurement.
par(mfrow=c(1,1))
barplot(sort(population,decreasing = TRUE), main="Pareto bar chart", ylab="Pop. size (mln)", col="blue")




● Pie chart
○ Size of pieces of pie is determined by relative frequency of
category. Mainly used for qualitative data.
○ pie(population/sum(population), col=c("green", "yellow" , "brown",
"blue","red", "grey","purple", "orange", "pink", "black"))




4
€12,99
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
tararoopram Vrije Universiteit Amsterdam
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
26
Lid sinds
3 jaar
Aantal volgers
2
Documenten
38
Laatst verkocht
1 maand geleden

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen