Test Theory – Lecture Notes
Inhoudsopgave
TEST THEORY – LECTURE NOTES ....................................................................................................................... 1
LECTURE 1 – 22/10/2018 ....................................................................................................................................... 2
Introduction .................................................................................................................................................... 2
Basic statistical knowledge............................................................................................................................. 2
LECTURE 2 – 24/10/2018 ....................................................................................................................................... 3
Properties of tests and items .......................................................................................................................... 3
LECTURE 3 – 29/10/2018 ....................................................................................................................................... 6
Transformed scores & norms .......................................................................................................................... 6
LECTURE 4 – 31/10/2018 ..................................................................................................................................... 10
Reliability: Classical Test Theory ................................................................................................................... 10
LECTURE 5 – 5/11/2018 ....................................................................................................................................... 12
Estimating reliability in practice ................................................................................................................... 12
LECTURE 6 – 7/11/2018 ....................................................................................................................................... 15
Test-quality assessment (reliability) ............................................................................................................. 15
LECTURE 7 – 12/11/2018 ..................................................................................................................................... 17
Test-quality assessment (reliability) ............................................................................................................. 17
LECTURE 9 – 19/11/18 ......................................................................................................................................... 21
Test-quality assessment (validity) ................................................................................................................ 21
LECTURE 10 – 21/11/2018 ................................................................................................................................... 25
Test-quality assessment (validity) ................................................................................................................ 25
LECTURE 11 – 26/11/2018 ................................................................................................................................... 29
Advanced use of tests (IRT) .......................................................................................................................... 29
LECTURE 12 – 28/11/2018 ................................................................................................................................... 34
IRT in practice ............................................................................................................................................... 34
LECTURE 13 – 3/12/2018 ..................................................................................................................................... 37
Test bias & fairness....................................................................................................................................... 37
1
,Lecture 1 – 22/10/2018
Introduction
Test yourself: How perfectionistic are you?
A = often → 3 points
B = sometimes → 2 points
C = never → 1 point
1. A
2. A
3. A
4. A
5. C
6. A
7. A
8. A
9. B
= 24 points
RED – Red alert: 22-27
Items as indicators for the construct (perfectionism)
- Answers are assigned scores (item scores)
- Item scores are transformed to test scores (generally sum scores)
- Test scores are interpreted
Triangle of positions towards psychological testing
Basic statistical knowledge
Deviation score
= Test scores – The mean of the test scores
S2(X)
The variance of the test scores
Sum of x2
→ Standard deviation: the square root of s2(X)
Overall the mean of the deviation scores is equal to zero
2
,The variance of deviation scores is the same as the variance of the test scores.
Z-scores
/standard scores
Z-scores always have a mean score of 0 and a variance of 1
Covariance: the extent to which two variables covary, does not tell you anything about the strength
of the correlation.
Sum of XY divided by N
Covariance tells us whether it’s a negative or positive association, correlation tells us how strong this
association is.
Correlation
Correlation is the ‘standardized covariance’
The covariance divided by the SD of x and the SD of y
Lecture 2 – 24/10/2018
Properties of tests and items
What is a psychological test?
Cronbach (1960): ‘A systematic procedure for comparing the behavior of two or more people’
- Multiple choice aptitude test
- Personality test with open ended questions
- Systematic behavioral observation
- Rorschach inkblot test
When is something a psychological test?
▪ Aimed at measuring behavior (observable)
▪ Systematic (objective)
▪ Comparison of different people (or of people over time) (comparative)
We’re putting you in a position where you show a certain type of observable behavior.
Type tests
Tests for maximum performance vs. ‘typical’ performance:
Maximum performance = showing what you are maximally capable off.
Typical performance = a test measuring a psychological construct that is not an ability (also no wrong
or right answer).
- Maximum performance tests for measuring skills/aptitude
- Typical performance tests for measuring personality traits, attitudes, disorders
- Big differences in the approach of test development
- Few differences in the statistical analysis of test scores
Two types of maximum performance tests
Power and speed tests
3
, - Power tests measure skill without time pressure (most common)
More skilled people give more correct answers
- Speed tests measure performance under severe time pressure
Question difficulty is trivial
- More skilled people answer more questions within time limit
Norm-referenced or criterion-referenced tests
- Norm-referenced tests compare people to the rest of the population
Good norm data on this population of great importance
- Criterion-referenced tests compare people with an absolute standard
Test inferences not tied to performance level in the population
o E.g. exam Test Theory is criterion referenced
What does a psychological test contain?
- Test material
- Test forms
- Test manual
o Precise test instructions
o Score-processing procedure
o Norm tables
o Discussion of scientific qualities
The step from answer to score is the assessment.
Item scores are determined such that they are indicative of the construct you want to measure →
higher item scores = ‘higher’ on that attribute.
We make it such that people with a higher item score are also higher within the answers.
‘I like to party in the weekend’ ‘I love reading books on a Friday night’
Agree / Do not agree Agree / Do not agree
= indicative on being extravert = contraindicative on being extravert
= scoring high for extraversion = scoring low for extraversion
Properties of the test score
- Test score is generally the sum of the item scores
Most important outcome of the test that is used
- Test manual gives instructions on how to interpret the score
With norm-referenced tests, norm table needs to be consulted
o E.g. 30% of boys aged 3 have a score lower than 3 (30th percentile)
Measurement level test score
Test score is a number.
What is the level of measurement of this number?
Interpretation of this number depends on the level of measurement of the test score:
▪ Nominal (e.g. personality types) → no order between the people
▪ Ordinal (e.g. short Likert scales) → we’re ordering people but we’re only able to say for
example ‘John scores higher than Mary’ not yet ‘John scores 8 points higher than Mary’
▪ Interval (e.g. long Likert scales) → we’re able to give meaning to the differences of scores,
for example ‘Pete scores 3 points higher than John and John scores 3 points higher than
Mary’ we can than say ‘Pete scores Pete is a much more extraverted, as John is as more
extraverted than Mary’
▪ Ratio (e.g. Bourdon dot test) → we’re able to classify, order and interpret people, but here
we also would have to have a meaningful zero-point, for example ‘John has 10 point and
Mary 20 points, therefore Mary has twice as much points as John’
4
Inhoudsopgave
TEST THEORY – LECTURE NOTES ....................................................................................................................... 1
LECTURE 1 – 22/10/2018 ....................................................................................................................................... 2
Introduction .................................................................................................................................................... 2
Basic statistical knowledge............................................................................................................................. 2
LECTURE 2 – 24/10/2018 ....................................................................................................................................... 3
Properties of tests and items .......................................................................................................................... 3
LECTURE 3 – 29/10/2018 ....................................................................................................................................... 6
Transformed scores & norms .......................................................................................................................... 6
LECTURE 4 – 31/10/2018 ..................................................................................................................................... 10
Reliability: Classical Test Theory ................................................................................................................... 10
LECTURE 5 – 5/11/2018 ....................................................................................................................................... 12
Estimating reliability in practice ................................................................................................................... 12
LECTURE 6 – 7/11/2018 ....................................................................................................................................... 15
Test-quality assessment (reliability) ............................................................................................................. 15
LECTURE 7 – 12/11/2018 ..................................................................................................................................... 17
Test-quality assessment (reliability) ............................................................................................................. 17
LECTURE 9 – 19/11/18 ......................................................................................................................................... 21
Test-quality assessment (validity) ................................................................................................................ 21
LECTURE 10 – 21/11/2018 ................................................................................................................................... 25
Test-quality assessment (validity) ................................................................................................................ 25
LECTURE 11 – 26/11/2018 ................................................................................................................................... 29
Advanced use of tests (IRT) .......................................................................................................................... 29
LECTURE 12 – 28/11/2018 ................................................................................................................................... 34
IRT in practice ............................................................................................................................................... 34
LECTURE 13 – 3/12/2018 ..................................................................................................................................... 37
Test bias & fairness....................................................................................................................................... 37
1
,Lecture 1 – 22/10/2018
Introduction
Test yourself: How perfectionistic are you?
A = often → 3 points
B = sometimes → 2 points
C = never → 1 point
1. A
2. A
3. A
4. A
5. C
6. A
7. A
8. A
9. B
= 24 points
RED – Red alert: 22-27
Items as indicators for the construct (perfectionism)
- Answers are assigned scores (item scores)
- Item scores are transformed to test scores (generally sum scores)
- Test scores are interpreted
Triangle of positions towards psychological testing
Basic statistical knowledge
Deviation score
= Test scores – The mean of the test scores
S2(X)
The variance of the test scores
Sum of x2
→ Standard deviation: the square root of s2(X)
Overall the mean of the deviation scores is equal to zero
2
,The variance of deviation scores is the same as the variance of the test scores.
Z-scores
/standard scores
Z-scores always have a mean score of 0 and a variance of 1
Covariance: the extent to which two variables covary, does not tell you anything about the strength
of the correlation.
Sum of XY divided by N
Covariance tells us whether it’s a negative or positive association, correlation tells us how strong this
association is.
Correlation
Correlation is the ‘standardized covariance’
The covariance divided by the SD of x and the SD of y
Lecture 2 – 24/10/2018
Properties of tests and items
What is a psychological test?
Cronbach (1960): ‘A systematic procedure for comparing the behavior of two or more people’
- Multiple choice aptitude test
- Personality test with open ended questions
- Systematic behavioral observation
- Rorschach inkblot test
When is something a psychological test?
▪ Aimed at measuring behavior (observable)
▪ Systematic (objective)
▪ Comparison of different people (or of people over time) (comparative)
We’re putting you in a position where you show a certain type of observable behavior.
Type tests
Tests for maximum performance vs. ‘typical’ performance:
Maximum performance = showing what you are maximally capable off.
Typical performance = a test measuring a psychological construct that is not an ability (also no wrong
or right answer).
- Maximum performance tests for measuring skills/aptitude
- Typical performance tests for measuring personality traits, attitudes, disorders
- Big differences in the approach of test development
- Few differences in the statistical analysis of test scores
Two types of maximum performance tests
Power and speed tests
3
, - Power tests measure skill without time pressure (most common)
More skilled people give more correct answers
- Speed tests measure performance under severe time pressure
Question difficulty is trivial
- More skilled people answer more questions within time limit
Norm-referenced or criterion-referenced tests
- Norm-referenced tests compare people to the rest of the population
Good norm data on this population of great importance
- Criterion-referenced tests compare people with an absolute standard
Test inferences not tied to performance level in the population
o E.g. exam Test Theory is criterion referenced
What does a psychological test contain?
- Test material
- Test forms
- Test manual
o Precise test instructions
o Score-processing procedure
o Norm tables
o Discussion of scientific qualities
The step from answer to score is the assessment.
Item scores are determined such that they are indicative of the construct you want to measure →
higher item scores = ‘higher’ on that attribute.
We make it such that people with a higher item score are also higher within the answers.
‘I like to party in the weekend’ ‘I love reading books on a Friday night’
Agree / Do not agree Agree / Do not agree
= indicative on being extravert = contraindicative on being extravert
= scoring high for extraversion = scoring low for extraversion
Properties of the test score
- Test score is generally the sum of the item scores
Most important outcome of the test that is used
- Test manual gives instructions on how to interpret the score
With norm-referenced tests, norm table needs to be consulted
o E.g. 30% of boys aged 3 have a score lower than 3 (30th percentile)
Measurement level test score
Test score is a number.
What is the level of measurement of this number?
Interpretation of this number depends on the level of measurement of the test score:
▪ Nominal (e.g. personality types) → no order between the people
▪ Ordinal (e.g. short Likert scales) → we’re ordering people but we’re only able to say for
example ‘John scores higher than Mary’ not yet ‘John scores 8 points higher than Mary’
▪ Interval (e.g. long Likert scales) → we’re able to give meaning to the differences of scores,
for example ‘Pete scores 3 points higher than John and John scores 3 points higher than
Mary’ we can than say ‘Pete scores Pete is a much more extraverted, as John is as more
extraverted than Mary’
▪ Ratio (e.g. Bourdon dot test) → we’re able to classify, order and interpret people, but here
we also would have to have a meaningful zero-point, for example ‘John has 10 point and
Mary 20 points, therefore Mary has twice as much points as John’
4