100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Samenvatting

Summary Statistics & Methodology (880259-M-6)

Beoordeling
-
Verkocht
7
Pagina's
30
Geüpload op
21-06-2022
Geschreven in
2021/2022

Detailed summary of all lectures and additional notes, explanations and examples for the course "Statistics and Methodology" at Tilburg University which is part of the Master Data Science and Society. Course was given by L.V.D.E. Vogelsmeier during the second semester, block three of the academic year 2021 / 2022 (January to March 2022).

Meer zien Lees minder










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
21 juni 2022
Aantal pagina's
30
Geschreven in
2021/2022
Type
Samenvatting

Onderwerpen

Voorbeeld van de inhoud

Tilburg University
Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 3 (January to March 2022)


Course: Statistics and Methodology (880259-M-6)
Lecturers: L.V.D.E. Vogelsmeier

,Lecture 1: Statistical Inference, Modeling and Prediction


Introduction to statistical inference


Statistical Reasoning
• consideration of uncertainty
• systematize the way we account for uncertainty when making data-based decisions
→ avid bias by ourselves: “get the result I wish to find”

Probability Distributions
• Probability distributions quantify how likely it is to observe each possible value of some
probabilistic entity “re-scaled frequency distributions”
• they show the proportion of observations that are in a certain bin, not the absolute number /
frequency of observations
• probability distributions with higher standard deviation are broader and less high

Statistical Testing
• When we conduct statistical tests, we weight the estimated effect by the precision of the
estimate.
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑁𝑢𝑙𝑙 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
• Wald Test (type of T test) 𝑇 =
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦
o if there is no effect hypothesized, we assume “0”
o in general, the larger the test statistic, the better

Sampling Distribution of the test statistic
• probability distribution of a statistic
• The sampling distribution quantifies the possible values of the test statistic over infinite
repeated sampling.
• The area of a region under the curve represents the probability of observing a test statistic
within the corresponding interval.
• To quantify how exceptional our estimated test statistic is, we compare the estimated value
to a sampling distribution of t-statistics assuming no effect (null hypothesis)
o null hypothesis = no effect → “nil-null”
• If our estimated statistic would be very unusual in a population where the null hypothesis is
true, we reject the null and claim a “statistically significant” effect

Interpreting P-Values
• All that we can say is that there is a 0.032 probability (p value) of observing a test statistic at
least as large as 𝑡̂, if the null hypothesis is true.



Introduction to statistical modeling
• For simple questions we can use statistical testing to control for uncertainty. In most real-
world cases, we want to employ a modeling perspective to control for confounding variables.
• When modeling, we can make inferences about the model parameters, or we can predict
outcomes for new cases.

, Lecture 2: Research Cycle, Research Design and Exploratory Data Analysis


Discuss research/data science cycle
• CRISP-DM: The Cross-industry
Standard Process for Data
Mining was developed to
standardize the process of data
mining in industry applications
• The Data Science Cycle combines
the classical Research Cycle and
the CRISP-DM. The grey colored
activities are mandatory.



Discuss research design in data science
• In data science, we rarely design experiments/empirical studies
• Research design is still crucial to data science to design an appropriate analysis.
o You must know how to operationalize the question in a statistically rigorous way.
▪ Make sure you understand exactly what is being asked
▪ Convert each aspect of the question into something quantifiable
▪ If possible, code the research question into a set of hypotheses.
o You must be able to choose/build a statistical model, statistical test, or machine
learning algorithm that can answer your well-operationalized research question.
▪ Once you have a well-operationalized research question, you need to
convert that question into some type of model or test.
o You must understand what types of data/data sources you’ll need.



Introduce EDA (Exploratory Data Analysis)
• interactively analyze/explore your data
• More of a mindset than a specific set of techniques or steps: data driven approach to explore
something, not to test hypothesis
• diverse selection of tools to use
o Statistical graphics: Histograms, Boxplots, Scatterplots, Traceplots
o Summary graphics: measures of tendency & dispersion, order statistics
o Data Screening/Cleaning: missing data, outliers, invalid values

Interfacing EDA & CDA (Confirmatory Data Analysis)
• CDA: there is usually a clear hypothesis to test, we have some prior knowledge which we
want to test, e.g., by using hypothesis testing
• unsupervised learning models are usually more EDA because we want to find pattern
• Either can stand alone, but they play together better
o When the data are well-understood, we can proceed directly to CDA.
o If we don’t care about testing hypotheses, we can focus on EDA.
• EDA can be used to generate hypotheses for CDA.
• EDA can be used to sanity check (Plausibilitätsprüfung) hypotheses

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
hannahgruber Tilburg University
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
102
Lid sinds
3 jaar
Aantal volgers
63
Documenten
9
Laatst verkocht
2 weken geleden

4,3

8 beoordelingen

5
5
4
1
3
1
2
1
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen