Test theory
Lecture 1: introduction and basic knowledge of statistics 3
Lecture 2: properties of tests and items 4
Cohen’s kappa 4
Lecture 3: Transformed scores and norms 5
Distribution table 5
Association between 2 variables 5
Composite scores 6
Converted standard scores 6
Linear transformations 6
Non-linear transformations 7
Normalized scores 7
Stanines: standardized nines 7
Lecture 4 Reliability 8
Reliability 8
Variance 8
Conceptualizing reliability (4 ways). 8
Standard deviation 9
Estimating test reliability 9
Lecture 5 Test quality assessment (reliability) 10
Test-retest reliability 11
Parallel (alternate) forms of reliability 11
Internal consistency reliability 11
Lecture 6 Factors affecting reliability 13
Factors affecting reliability 13
Lecture 7 Effect sizes and Item-rest correlation 14
Confidence intervals and measurement error 14
Effect sizes and statistical significance 14
Item discrimination and internal consistency 15
Lecture 8 Validity 17
Content validity 17
Internal validity 17
Response process validity 17
Associations with other variables 17
Consequential validity 17
Lecture 9 Convergent and discriminant validity 19
Types of correlations 19
Factors affecting predictive validity (for admission purpose) 20
, Interpretative approach (estimating practical effects) 20
Analysis of test sensitivity and specificity 21
Lecture 10 Item response theory and Rasch models 22
Item response theory (IRT) = modern test theory 22
IRT measurement models 22
Item information and test information 23
Lecture 11 IRT in practice 25
Purpose of item response theory 25
Comparing IRT and CTT 26
True score in IRT 26
Lecture 12 Test bias 27
Types of test bias 27
Ways of evaluation construct bias 27
External methods to identify predictive bias 27
Intercept bias 28
Slope bias 28
Intercept and slope bias 28
, Lecture 1: introduction and basic knowledge of
statistics
Test theory: developing and ensuring high-quality psychological tests to measure not
directly observable properties.
In tests, items are indicators for the construct (like perfectionism). Answers are assigned
scores, which are transformed into test scores and then interpreted.
Psychological test is a systematic procedure for comparing the behavior of 2 or more
people (Cronbach). Aimed at measuring behavior (observable), samples collected in a
systemic way (objective), and compare behaviors of multiple people or same people over
time (comparative).
Maximum performance tests: measuring maximum skills
1. Speeded tests: time-limited tests to measure performance, count number of
questions answered.
2. Power tests: measure skills without time pressure, count number of questions
correct.
Typical performance tests: measuring personality traits, attitudes and disorders.
Criterion referenced tests: compare people with an absolute standard of skill level. A
person’s performance is either above or belong the performance criterion.
Norm reference tests: compare people with the rest of the population.
Reflective/effect indicators: a hypothetical construct (intelligence) causes someone’s
responses on an intelligence test.
Formative/causal indicators: indicators of SES (income, educations) are the things that
define SES.
Psychometrics: the science on evaluation the attributes of psychological tests
1. Type of information
2. The reliability of data from psychological tests
3. Issues concerning the validity of data obtained from psychological tests
Measurement challenges
1. Participant reactivity: people change their behavior because they know they are
being measured.
2. Data collectors may be biased.
3. Score sensitivity: ability of a measure to discriminate between meaningful amounts or
units of the dimension being measured.
High variance and covariance between item scores is desirable because item scores are
intended to reveal differences between people.
correlation is also important, some score high on all items and some score low on all items,
also causing more variance.
Lecture 1: introduction and basic knowledge of statistics 3
Lecture 2: properties of tests and items 4
Cohen’s kappa 4
Lecture 3: Transformed scores and norms 5
Distribution table 5
Association between 2 variables 5
Composite scores 6
Converted standard scores 6
Linear transformations 6
Non-linear transformations 7
Normalized scores 7
Stanines: standardized nines 7
Lecture 4 Reliability 8
Reliability 8
Variance 8
Conceptualizing reliability (4 ways). 8
Standard deviation 9
Estimating test reliability 9
Lecture 5 Test quality assessment (reliability) 10
Test-retest reliability 11
Parallel (alternate) forms of reliability 11
Internal consistency reliability 11
Lecture 6 Factors affecting reliability 13
Factors affecting reliability 13
Lecture 7 Effect sizes and Item-rest correlation 14
Confidence intervals and measurement error 14
Effect sizes and statistical significance 14
Item discrimination and internal consistency 15
Lecture 8 Validity 17
Content validity 17
Internal validity 17
Response process validity 17
Associations with other variables 17
Consequential validity 17
Lecture 9 Convergent and discriminant validity 19
Types of correlations 19
Factors affecting predictive validity (for admission purpose) 20
, Interpretative approach (estimating practical effects) 20
Analysis of test sensitivity and specificity 21
Lecture 10 Item response theory and Rasch models 22
Item response theory (IRT) = modern test theory 22
IRT measurement models 22
Item information and test information 23
Lecture 11 IRT in practice 25
Purpose of item response theory 25
Comparing IRT and CTT 26
True score in IRT 26
Lecture 12 Test bias 27
Types of test bias 27
Ways of evaluation construct bias 27
External methods to identify predictive bias 27
Intercept bias 28
Slope bias 28
Intercept and slope bias 28
, Lecture 1: introduction and basic knowledge of
statistics
Test theory: developing and ensuring high-quality psychological tests to measure not
directly observable properties.
In tests, items are indicators for the construct (like perfectionism). Answers are assigned
scores, which are transformed into test scores and then interpreted.
Psychological test is a systematic procedure for comparing the behavior of 2 or more
people (Cronbach). Aimed at measuring behavior (observable), samples collected in a
systemic way (objective), and compare behaviors of multiple people or same people over
time (comparative).
Maximum performance tests: measuring maximum skills
1. Speeded tests: time-limited tests to measure performance, count number of
questions answered.
2. Power tests: measure skills without time pressure, count number of questions
correct.
Typical performance tests: measuring personality traits, attitudes and disorders.
Criterion referenced tests: compare people with an absolute standard of skill level. A
person’s performance is either above or belong the performance criterion.
Norm reference tests: compare people with the rest of the population.
Reflective/effect indicators: a hypothetical construct (intelligence) causes someone’s
responses on an intelligence test.
Formative/causal indicators: indicators of SES (income, educations) are the things that
define SES.
Psychometrics: the science on evaluation the attributes of psychological tests
1. Type of information
2. The reliability of data from psychological tests
3. Issues concerning the validity of data obtained from psychological tests
Measurement challenges
1. Participant reactivity: people change their behavior because they know they are
being measured.
2. Data collectors may be biased.
3. Score sensitivity: ability of a measure to discriminate between meaningful amounts or
units of the dimension being measured.
High variance and covariance between item scores is desirable because item scores are
intended to reveal differences between people.
correlation is also important, some score high on all items and some score low on all items,
also causing more variance.