07-02-2019: Lecture 1 – Introduction: Developing maximum and typical performance tests
Psychological and educational tests
- Test Construction
o Development and application
What does the test look like?
Instructions for administration, scoring and interpretation
Actual administrations of tests
What information does it give?
What is the usefulness of this information, and for whom
(individuals, policy)?
- Test theory
o Statistical theory about behavior of item scores and test scores
Examples: classical test theory, item response theory
Important issues: quantitive measures for the quality of items and
tests for target groups of respondents
Both are needed for a sensible use of tests
Use of tests: in practice
1. Human Resource Management
a. Personnel selection and development
2. Education: individual development and performance of students
a. Identify deviating patterns of development (pupil assessment system (=
leerlingvolgsysteem)
b. Prediction of most suitable type of high school (end of primary school cito
toets in groep 8)
3. Psychodiagnostics
a. Neuropsychology, clinical psychology, developmental psychology
Judgments on individuals
Use of tests: in research
- Testing of hypothesis, theory: theory building
o E.g., ‘Location and size of brain damage determines type and severity of
behavioural difficulties in the long term’
o Variables:
Indicators of location and size of brain damage
Behavioural difficulties
e.g., Anxiety, Agression, Childish behaviour, Apathy, Lack of
Insight
Judgments on populations
Definition: ‘A psychological or educational test is an instrument for the measurement of a
person’s maximum or typical performance under standardized conditions, where the
performance is assumed to reflect one or more latent attributes.’
,Test types
- Typical performance test
o Typifies person – no correct answers you can use them to describe a
person or they give you some information about a person.
E.g. personality, attitude, mental health
- Maximum performance test
o Person’s achievement, there are correct and incorrect answers.
E.g. intelligence, ability level
Standardization
- Test conditions are fixed
o E.g test material, instructions, administration procedure, score computing
- Aim: to ensure comparability of test performance between persons and test
occasions
- Difficult to achieve perfect standardization
- Specific aspects to standardize dependent on for example test or target population
Latent attribute
- Attribute that can’t be measured directly
o E.g. verbal ability, arithmetic skills,
severity of depression
- Test score (X) should reflect the latent attribute
of interest (T; True score)
o Causal relationship between attribute and test score so the true score
influences the test score. But there is always a measurement error. That’s
why there isn’t a perfect measure possible.
o Thus: if 2 persons differ on the attribute, the test sores differ as well, and the
other way around
Some important terminology:
- Item
o Smallest test unit, on which a person is scored
o Score can be the same as persons response
- Subtest (also denoted as subscale, or just scale)
o Independent part of a test
o Indicative of an attribute
o Consists of various items
Example of a maximum performance test
- Bayley-III
o Aims to assess the developmental level of young children (1-42 months)
o Individual, standardized assessment
o Normed scores
o Assessing the developmental level by playing ( observational instrument)
- Aims of use:
o For children with concerns about development
, o Diagnosis of developmental delays, in order to plan and/or evaluate
interventions
- Bayley-III consists of 5 (or 7) subscales
o Administered with child interaction
Cognition
Language
Reception
Production
Motor
Fine
Gross
o Parent questionnaires
Social-emotional
Adaptive behavior
Steps of Test construction:
- 1. Define the construct of interest
o Constructs abstract, theoretical
concepts
o Literature research
o Homogeneity(1 construct and the
indicators fit together) and dimensionality (do I want to measure 1 construct
or more that tell me something about a certain subject e.g. personality)
- 2. Develop the test
o Essential aspects
Measurement mode of the test
Self-performance mode (intelligence test e.g.)
Self-evaluation mode (personality tests e.g.)
Other-evaluation mode (e.g. the bayley in which the parent
rates the behavior of the child)
Example: SDQ slide 32 – 34: uses a multiple mode of tests:
other-evaluation mode and self-evaluation – which one of the
modes is used, depends on the context of assessment.
Objectives of the test
Research vs. practice
Individual or group level
Description vs. diagnosis vs. decision-making
Population and subpopulations of testees
Be as specific as possible
Inclusion and exclusion criteria
Too broad implications for norm groups and their
representativeness
Conceptual framework of the test
More specific than just definition; it helps to write items
Typical performance: three broad classes of strategies
o Intuitive: rational, prototypical
, o Deductive:
Construct method: use of theoretical framework
(e.g. Koster et al)
Facet design method: conceptual analysis of the
construct
o Inductive: constructs to be measured can’t be defined
beforehand, but are identified using association
measures (e.g. correlations)
Internal: associations among items
External: associations between items and
external criterion
Item response mode
There are many see the book
Frequently-used scales
o Dichotomous = binary
E.g. yes/no, true/false, correct/incorrect
o Ordinal polytomous
E.g. never/sometimes/often
Administration mode
Oral
Paper and pencil
Computerized
Computerized adaptive test administration
Item writing
The book describes different concrete guidelines
o Both for typical and maximum performance test items
In general:
o Each item represents one idea
o Be specific
o Use positive and negative formulated items
o Avoid expressions and jargon
o Consider the reading level of the user
o Avoid the use of ‘not’
- 3. Pilot study
o Check whether instructions and items are clear
o Three types of studies
Experts’ pilot
Concept items are reviewed by experts
Test takers’ pilot
Concept items are administred to small group test takers form
the target population
E.g. use read-aloud protocol or think out loud protocol
Raters’ pilot
Yields important information to remove items, remove raters
and/or to improve training