Data Statistics (JBM010)
Semester 1, 2020-2021
Data Statistics (JBM010)
Probability Theory 2
Introduction 2
Important de nitions 2
Rules for sets / events 3
Historical de nitions of probability theory 3
Conditional probabilities 4
Rules for counting 4
Different sampling methods 4
Probability distributions 5
Functions of random variables 7
Expectation, variance and standard deviation (std) 7
Rules for expectation and variance 8
Covariance and correlation 9
Quantiles 9
Sampling Theory 11
Population and sample statistics 11
Sampling methods 11
Inferential Statistics 12
Sample statistics and estimators 12
Random sampling with replacement 12
Random sampling without replacement 13
Pdf of a sample mean 13
Central Limit Theorem 13
Consequences of the CLT 13
Properties of S-squared and S 14
Standard errors 14
Statistical procedures 14
Hypothesis testing for mu when sigma is known 15
Hypothesis tests and p-values for mu when sigma is known 17
Inference about mu when sigma is unknown 17
Common formats 19
, Probability Theory
In short:
• Introduction
• Important de nitions
• Rules for sets / events
• Historical de nitions of probability theory
• Conditional probabilities
• Rules for counting
• Different sampling setups
• Probability distributions
• Functions of random variables
• Expectation, variance and standard deviation
• Rules for expectation and variance
• Covariance and correlation
• Quantiles
Introduction
In data science, one measures speci c sets of objects. The set of objects that is under investigation,
is called the population. The objects in this population are called elements. Measurements (data) are
made on these elements and re ect a certain characteristic. Usually, due to the largeness of
populations, merely a sample of elements of this population is used for statistical measurements.
This sample represents the larger population. Statistics consists of four sub- elds:
- probability theory;
- sampling theory (methods of sampling and their properties);
- descriptive statistics (summarizing and presenting the data);
- inferential statistics (methods to draw conclusions about distinctive numbers of the whole
population of interest by considering data from a sample).
Important de nitions
Random experiment = an experiment/phenomenon for which the outcome is determined by
chance
Outcome = possible results of an experiment
Sample space (Ω) = all the possible outcomes of an experiment
Event = subsets of the sample space (Ω)
Partition = collection of subsets of the sample space if mutually disjoint and the
union is Ω