Learning form data:
• Understanding simple relationships/properties of processes
• Understanding complex relationships/properties
• Proving conjectures
• Disproving conjectures (hypotheses!)
• Predicting future events
• Help making decisions under uncertainty...
• Uncertainty is what we have to deal with very often.
o IMPORTANT: Statistics does not remove uncertainty
= powerful tool!
To apply statistics we need data:
- By collecting data
- Summarize data
We can distinguish two types of variables:
1. Categorical variables: places an individual in one of several groups
2. Quantitative variables: takes on numerical values for which typical arithmetic
operations make sense:
• Interval data: Difference can be meaningfully interpreted but relative
numbers not: If today’s temperature is 10 Celsius, whereas it was 5 Celsius
yesterday, we cannot say that today it is twice as warm. It is however 5
degrees warmer
• Ratio data: As interval but now also relative numbers can be interpreted.
For example: 2000 euro is twice as much as 1000 euro.
These two variables are used to determine which chart you need to use:
1. Categorical data: graphing distributions
• Pie charts
• Bar charts
o Pareto charts
2. Quantitative data: graphing distributions
• Histogram
o Horizontal axis: Classes of the quantitative variable
o Vertical axis: (Relative) frequencies of the classes
o Always numerical variables!
, o Provides a visual summary of the distribution of values.
o Looking at the plot we focus on:
• Central tendency: what is the “middle” of the observed values.
• Spread: how are the data distributed around the “middle”. In what
range of values do the observations tend to fall. Does there appear to be
high or low variability.
• Shape: Are there any striking patterns in the distribution. For example:
are there multiple peaks? Is the distribution symmetric/asymmetric?
• Stemplot
o For small data sets, a fast and slightly more detailed way to graph the
data is by means of the stemplot
• Timeplot
o = displays change over time
o Horizontal axis: Time of observation
o Vertical axis: Variable of interest
o For each observation, we have two variables:
▪ one with the time of observation
▪ other with the value of the variable of interest
, Numerical measures: central tendency
= very sensitive to observations that stand out!