Samenvatting
Lecture 1
Statistical definitions:
Σ = sum optellen
X = individual scores
N = number of scores
< > = less/ greater then
. |..| = absolute value, so |-3| = 3
A proportion of .50 = 50%, A proportion of 0.05 = 5%
Scales of measurement
4 scales:
Nominal
Only categories, no order
Gender
Ordinal
Categories but with logical order/ranking, no fixed distance between the values
education → University, HBO, MBO (so an order but no magnitude between like 1)
Interval
Categories with order and fixed distance
intelligence → IQ 105, 100, 95
Ratio
Categories with order, fixed distance and a meaningful 0
recidivism → 0? better than 2
Nominal & ordinal are discrete
interval & ratio are continuous
Describing data
Nominal
frequency distribution
categories but no order
Left-right handed example
Frequency: the number times this category was observed
R: 72
L: 18
Samenvatting 1
, Relative frequency: the frequency of a category divided by the total frequency
R: 72 : 92 = 0.78
L: 18 : 92 = 0.20
Central tendency
Mode: category with the highest frequency
Right handed (72>18)
bimodal distribution: when there are two modes
variability
-
Ordinal
frequency distribution
categories but order
percentile rank: cumulative percentage, the percentage of the data at or below a
category or score → sum the data of that one and below because they also belong to
that categorie
Bar graph
Central tendency
Median: midmost score in a distribution → as many score below it as above it, the 50th
percentile → sorted from low to high!
example: 5 10 11 12 18 → median 11
5 10 11 12 18 23 → not one median so 11.5
Variability
range: difference between highest and lowest score
5 10 11 12 18 → 13
Interval & ratio
Frequency distribution
Histogram difference between bar: bars are closer to each other and the numbers that
are not answered are still in the graph, just not showing a bar. In a bar chart these
would be left out
Frequency polygon → in a line instead of bars
Central tendency
Mean = average → summing all the scores and divide by numbers of scores
ΣX/N
sum individual scores and divide by number of scores
2 5 8 10 15 → 40/5 = 8
Samenvatting 2
, In case of extreme scores, we prefer the median for interval & ratio data, it summarizes
better because the mean would be influenced to much by the extreme scores
Positive skew / symmetrical distribution / negative skew (parabolen)
Variability
variance and standard deviation
deviation = difference between a score and its mean
deviation of a score from a population mean: x = X-μ
Variance = average of squared deviations from the mean
Σ(X − μ)2
= σ2 =
N
Other ways to summarize the data
Stem-and-leaf display:
Boxplot:
Visualize the variability
Also a measure of variability when the distribution is skewed
Samenvatting 3
Lecture 1
Statistical definitions:
Σ = sum optellen
X = individual scores
N = number of scores
< > = less/ greater then
. |..| = absolute value, so |-3| = 3
A proportion of .50 = 50%, A proportion of 0.05 = 5%
Scales of measurement
4 scales:
Nominal
Only categories, no order
Gender
Ordinal
Categories but with logical order/ranking, no fixed distance between the values
education → University, HBO, MBO (so an order but no magnitude between like 1)
Interval
Categories with order and fixed distance
intelligence → IQ 105, 100, 95
Ratio
Categories with order, fixed distance and a meaningful 0
recidivism → 0? better than 2
Nominal & ordinal are discrete
interval & ratio are continuous
Describing data
Nominal
frequency distribution
categories but no order
Left-right handed example
Frequency: the number times this category was observed
R: 72
L: 18
Samenvatting 1
, Relative frequency: the frequency of a category divided by the total frequency
R: 72 : 92 = 0.78
L: 18 : 92 = 0.20
Central tendency
Mode: category with the highest frequency
Right handed (72>18)
bimodal distribution: when there are two modes
variability
-
Ordinal
frequency distribution
categories but order
percentile rank: cumulative percentage, the percentage of the data at or below a
category or score → sum the data of that one and below because they also belong to
that categorie
Bar graph
Central tendency
Median: midmost score in a distribution → as many score below it as above it, the 50th
percentile → sorted from low to high!
example: 5 10 11 12 18 → median 11
5 10 11 12 18 23 → not one median so 11.5
Variability
range: difference between highest and lowest score
5 10 11 12 18 → 13
Interval & ratio
Frequency distribution
Histogram difference between bar: bars are closer to each other and the numbers that
are not answered are still in the graph, just not showing a bar. In a bar chart these
would be left out
Frequency polygon → in a line instead of bars
Central tendency
Mean = average → summing all the scores and divide by numbers of scores
ΣX/N
sum individual scores and divide by number of scores
2 5 8 10 15 → 40/5 = 8
Samenvatting 2
, In case of extreme scores, we prefer the median for interval & ratio data, it summarizes
better because the mean would be influenced to much by the extreme scores
Positive skew / symmetrical distribution / negative skew (parabolen)
Variability
variance and standard deviation
deviation = difference between a score and its mean
deviation of a score from a population mean: x = X-μ
Variance = average of squared deviations from the mean
Σ(X − μ)2
= σ2 =
N
Other ways to summarize the data
Stem-and-leaf display:
Boxplot:
Visualize the variability
Also a measure of variability when the distribution is skewed
Samenvatting 3