Knowledge clip 1 (chapter 1):
Statistics
The art and science of collecting, analyzing, presenting, and interpreting data
Providing information to support decision-making
Modern ‘’synonym’’: Data Science
Some terminology
Database, data set (in SPSS, Excel, ..)
Often as a data matrix
o Columns = variables
o Rows = Observations, elements, cases, subjects
o Each cell = Measurement, data point
Types of variables: Level of Measurement
Nominal data (label) = male or female
Ordinal data = sport competition – 1st or 4th
Interval data = temperature – difference 10 and 11 degrees of Celsius is the same as 20 till 21
Ratio data = age of an individual
Types of data sets:
1. Cross-sectional data
Survey of cases, all measured at one period of time
e.g. survey conducted among customers
2. Time-series data
Variables measured over time
e.g. various stock prices
3. Panel data
, Combination of both: Multiple cases, same variables measured at multiple time
points
e.g. consumer panel reporting purchase behavior
“Statistics is a way to get information from data”
Key statistical concepts
Population
A population is the group of all items/ cases of interest
One wants to draw a conclusion on this group
Sample
A sample is a group of items/ cases drawn from the population
One applies statistical analysis on the data from a sample
Statistics & The Empirical Cycle
Knowledge clip 2 (chapter 2):
,Descriptive statistics: Tables and Figures
Statistical methods:
Absolute frequency = exact numbers
Relative frequency = in %
Knowledge clip 3 (chapter 3):
Descriptive statistics: Numerical measures
Properties of distributions I:
Population = the underlying group of people you want to study to make statement -> probability of
distribution
Collect data from sample in order to provide a statement -> histogram of the frequency distribution
Key characteristics of a distribution:
Measures of central tendency (location) = the lower or higher the number is of the values
in the observation
o Mode, median and mean
, Mode = value of the variable with the highest frequency in the database -> only measure of central
tendency for nominal variables
Median = middle observation if all cases are ranked from low to high (even number of cases = average
of the two middle values) -> can be used for ordinal and higher level
Mean = average value (sum divided by the number of cases) -> used for interval and ratio variables
only.
Measures of variability = how much difference is there in the observations
o Minimum and maximum, (interquartile) range, standard deviation and variance
Minimum and maximum = used for ordinal and higher measurement level
Range = values observed in the data (maximum – minimum) -> only for variables with interval or ratio
measurement level
Interquartile range = difference between the third and first quartile -> so middle 50% of the data,
from observation 25% to 75% (when calculating the mean)
Standard deviation = population SD and the sample SD -> xi = the value from your measurement and
x or u is the mean, n = number of observations -> only for variables with interval or ratio (see picture
below for equation)
Variance (square of the SD) = only for variables with interval or ratio measurement level
Coefficient of variation
Relative measure of variability
How much variation is in the sample, relative to the average of the variable
Comparable across variables that differ in scaling / size (thus in average)
o The higher the percentage the more variability there is