100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Summary

Summary Statistics & Methodology (880259-M-6)

Rating
-
Sold
7
Pages
30
Uploaded on
21-06-2022
Written in
2021/2022

Detailed summary of all lectures and additional notes, explanations and examples for the course "Statistics and Methodology" at Tilburg University which is part of the Master Data Science and Society. Course was given by L.V.D.E. Vogelsmeier during the second semester, block three of the academic year 2021 / 2022 (January to March 2022).

Show more Read less
Institution
Course










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 21, 2022
Number of pages
30
Written in
2021/2022
Type
Summary

Subjects

Content preview

Tilburg University
Study Program: Master Data Science and Society
Academic Year 2021/2022, Semester 2, Block 3 (January to March 2022)


Course: Statistics and Methodology (880259-M-6)
Lecturers: L.V.D.E. Vogelsmeier

,Lecture 1: Statistical Inference, Modeling and Prediction


Introduction to statistical inference


Statistical Reasoning
• consideration of uncertainty
• systematize the way we account for uncertainty when making data-based decisions
→ avid bias by ourselves: “get the result I wish to find”

Probability Distributions
• Probability distributions quantify how likely it is to observe each possible value of some
probabilistic entity “re-scaled frequency distributions”
• they show the proportion of observations that are in a certain bin, not the absolute number /
frequency of observations
• probability distributions with higher standard deviation are broader and less high

Statistical Testing
• When we conduct statistical tests, we weight the estimated effect by the precision of the
estimate.
𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑒 − 𝑁𝑢𝑙𝑙 𝐻𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑧𝑒𝑑 𝑉𝑎𝑙𝑢𝑒
• Wald Test (type of T test) 𝑇 =
𝑉𝑎𝑟𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦
o if there is no effect hypothesized, we assume “0”
o in general, the larger the test statistic, the better

Sampling Distribution of the test statistic
• probability distribution of a statistic
• The sampling distribution quantifies the possible values of the test statistic over infinite
repeated sampling.
• The area of a region under the curve represents the probability of observing a test statistic
within the corresponding interval.
• To quantify how exceptional our estimated test statistic is, we compare the estimated value
to a sampling distribution of t-statistics assuming no effect (null hypothesis)
o null hypothesis = no effect → “nil-null”
• If our estimated statistic would be very unusual in a population where the null hypothesis is
true, we reject the null and claim a “statistically significant” effect

Interpreting P-Values
• All that we can say is that there is a 0.032 probability (p value) of observing a test statistic at
least as large as 𝑡̂, if the null hypothesis is true.



Introduction to statistical modeling
• For simple questions we can use statistical testing to control for uncertainty. In most real-
world cases, we want to employ a modeling perspective to control for confounding variables.
• When modeling, we can make inferences about the model parameters, or we can predict
outcomes for new cases.

, Lecture 2: Research Cycle, Research Design and Exploratory Data Analysis


Discuss research/data science cycle
• CRISP-DM: The Cross-industry
Standard Process for Data
Mining was developed to
standardize the process of data
mining in industry applications
• The Data Science Cycle combines
the classical Research Cycle and
the CRISP-DM. The grey colored
activities are mandatory.



Discuss research design in data science
• In data science, we rarely design experiments/empirical studies
• Research design is still crucial to data science to design an appropriate analysis.
o You must know how to operationalize the question in a statistically rigorous way.
▪ Make sure you understand exactly what is being asked
▪ Convert each aspect of the question into something quantifiable
▪ If possible, code the research question into a set of hypotheses.
o You must be able to choose/build a statistical model, statistical test, or machine
learning algorithm that can answer your well-operationalized research question.
▪ Once you have a well-operationalized research question, you need to
convert that question into some type of model or test.
o You must understand what types of data/data sources you’ll need.



Introduce EDA (Exploratory Data Analysis)
• interactively analyze/explore your data
• More of a mindset than a specific set of techniques or steps: data driven approach to explore
something, not to test hypothesis
• diverse selection of tools to use
o Statistical graphics: Histograms, Boxplots, Scatterplots, Traceplots
o Summary graphics: measures of tendency & dispersion, order statistics
o Data Screening/Cleaning: missing data, outliers, invalid values

Interfacing EDA & CDA (Confirmatory Data Analysis)
• CDA: there is usually a clear hypothesis to test, we have some prior knowledge which we
want to test, e.g., by using hypothesis testing
• unsupervised learning models are usually more EDA because we want to find pattern
• Either can stand alone, but they play together better
o When the data are well-understood, we can proceed directly to CDA.
o If we don’t care about testing hypotheses, we can focus on EDA.
• EDA can be used to generate hypotheses for CDA.
• EDA can be used to sanity check (Plausibilitätsprüfung) hypotheses
$8.05
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached


Also available in package deal

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
hannahgruber Tilburg University
Follow You need to be logged in order to follow users or courses
Sold
102
Member since
3 year
Number of followers
63
Documents
9
Last sold
2 weeks ago

4.3

8 reviews

5
5
4
1
3
1
2
1
1
0

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions