Repeated Measures Summary
Introduction
1. Multivariate statistics: why?
Multivariate statistics provide analysis when there are many IVs and/or many DVs, all correlating
with one another to varying degrees.
1.1 Domain of multivariate statistics: number of IVs and DVs
Multivariate statistics are the complete or general case, whereas univariate and bivariate
statistics are special cases of the multivariate model.
Independent variables (IVs) are the differing conditions (treatment vs. placebo) to which you
expose your subjects, or the characteristics that the subjects themselves bring into the research
situation. They predict the DVs – the outcome or response variables.
Univariate statistics refer to analyses with a single DV. There may be more than one IV, but
maybe not. Bivariate statistics frequently refers to analysis of two variables, where neither is an
experimental IV and the desire is simply to study the relationship between the variables.
With multivariate statistics, you simultaneously analyze multiple DVs and IVs.
1.2 Experimental and nonexperimental research
Distinctions between them:
- Experimental research
o Manipulates IV (random assignment)
o Controls other influential factors (by holding constant, counterbalancing, or
randomizing influence)
o Scores on DV are expected to be the same, within random variation, except for
the influence of the IV
o Systematic differences in DV? attributed to the IV
- Nonexperimental research
o IV is not manipulated. IV can be defined, but no control over the assignment of
subjects to levels of it
o Distinction between IVs and DVs is usually arbitrary
IV = predictor variable
DV = criterion variable
o Difficult to attribute causality to an IV (relationship can be mentioned, but the
cause is unclear)
1.3 Garbage in, roses out?
Certain costs increased flexibility in research design are paralleled by increased ambiguity in
interpretation of results. Multivariate results can also be sensitive to which analytic strategy is
chosen and do not always provide better protection against statistical errors than their
univariate counterparts.
, 2. Some useful definitions
2.1 Continuous, discrete, and dichotomous data
The definitions of these three terms are as followed:
- Continuous variables: measured on a scale that changes values smoothly rather than in
steps. Can take on any value within the range o the scale. Precision is limited by the
measuring instrument, not by the scale
o Example: analog clock face, annual income, age, temperature, distance, GPA
- Discrete variables: finite and usually small number of values, no smooth transition from
one value or category to the next.
o Example: digital clock, continents, categories of religious affiliation, communities
o Can be used as if continuous if there are numerous categories and they represent
a quantitative attribute (e.g. 1 stands for 0-4 years, 2 for 5-9 years)
o Dummy coding is also possible (dichotomous)
Rank order (ordinal) scale can also be used: assigns a number to each subject to indicate the
subject’s position vis-à-vis other subjects along some dimension.
2.2 Samples and populations
Samples are measured to make generalizations about populations. Random process, ideally. In
nonexperimental research, you investigate relationships among variables in some predefined
population (define population, than sample). In experimental research, you attempt to create
different populations by treating subgroups from an originally homogenous group differently.
Sampling objective is to ensure that all subjects come from the same population before you
treat them differently.
2.3 Descriptive and inferential statistics
Descriptive statistics describe samples of subjects in terms of variables or combinations of
variables. Inferential statistics test hypotheses about differences in populations on the basis of
measurements made on samples of subjects. More restrictions on inference.
2.4 Orthogonality: standard and sequential analyses
Orthogonality is a perfect nonassociation between variables knowing the value of one
variable gives no clue as to the value of the other (r = 0). Often desirable.
- All pairs of IVs orthogonal? Each IV adds to prediction of the DV
o Use Venn Diagram!
Total variance for income = one circle
Horizontal stripes represent income predictable from education
Vertical stripes represent part predictable for occupational prestige
Circle for income overlaps education for 35%, and occupational prestige
overlaps 45%, together accounting for 80% of variability in income
, o
- When IVs are correlated, they share overlapping variance
o Major decision for multivariate analyst is how to handle variance that is
predictable from more than one variable
o Overlapping variance is disregarded in assessing the contribution of each variable
to the solution
3. Linear combinations of variables
A linear combination is one in which each variable is assigned a weight, and then the products of
weights and the variable scores are summed to predict a score on a combined variable. The
combination of variables can be considered a supervariable, not directly measured but worthy
of interpretation. May represent an underlying dimension that predicts something or optimizes
some relationship.
4. Number and nature of variables to include
A general rule is to get the best solution with the fewest variables. A second problem is
overfitting: the solution is very good; so good in fact, that it is unlikely to generalize to a
population. Occurs when too many variables are included in an analysis relative to the sample
size.
5. Statistical power
Power represents the probability that effects that actually exist have a chance of producing
statistical significance in your eventual data analysis. Best considered in planning state:
1. Estimate size of anticipated effect (e.g. an expected mean difference)
2. Variability expected in assessment of effect
3. Desired alpha level (ordinarily .05)
4. Desired power (often .80)
These four estimates are required to determine necessary sample size!
6. Data appropriate for multivariate statistics
An appropriate data set for multivariate statistical methods consists of values on a number of
variables for each of several subject. Continuous variables: values are scores on variables. For
discrete variables: values are number codes for group membership or treatment.
6.1 The data matrix
Organization of scores in which rows (lines) represent subjects and columns represent variables.
, 6.2 The correlation matrix
R = a square, symmetrical matrix. Each row and column represents a different variable, and the
value at the intersection of each row and column is the correlation between two variables.
6.3 The variance-covariance matrix
If scores are measured along a meaningful scale, it is sometimes appropriate to analyze a
variance-covariance matrix (∑). The elements in the main diagonal are the variances of each
variable, and the off-diagonal elements are covariances between pairs of different variables.
- Variance: averaged squared deviations of each score from the mean of the score
- Covariance: averaged cross-products (product of deviation between one variable and its
mean and the deviation between a second variable and its mean)
6.4 The sum-of-squares and cross-products matrix
The matrix, S, is a precursor to the variance-covariance matrix in which deviations are not yet
averaged. The size of entry depends on the number of cases as well as on the metric in which
the elements were measured. The entry in the major diagonal is S, the sum of squared
deviations of scores from the mean for that variable.
N
SS ( X i ) =∑ ¿ ¿ ¿ ¿
i=1
Where i = 1, 2 …., N
N = number of subjects
j = variable identifier
X ij = score on variable j by subject i
Xj = mean of all scores on the jth variable
Introduction
1. Multivariate statistics: why?
Multivariate statistics provide analysis when there are many IVs and/or many DVs, all correlating
with one another to varying degrees.
1.1 Domain of multivariate statistics: number of IVs and DVs
Multivariate statistics are the complete or general case, whereas univariate and bivariate
statistics are special cases of the multivariate model.
Independent variables (IVs) are the differing conditions (treatment vs. placebo) to which you
expose your subjects, or the characteristics that the subjects themselves bring into the research
situation. They predict the DVs – the outcome or response variables.
Univariate statistics refer to analyses with a single DV. There may be more than one IV, but
maybe not. Bivariate statistics frequently refers to analysis of two variables, where neither is an
experimental IV and the desire is simply to study the relationship between the variables.
With multivariate statistics, you simultaneously analyze multiple DVs and IVs.
1.2 Experimental and nonexperimental research
Distinctions between them:
- Experimental research
o Manipulates IV (random assignment)
o Controls other influential factors (by holding constant, counterbalancing, or
randomizing influence)
o Scores on DV are expected to be the same, within random variation, except for
the influence of the IV
o Systematic differences in DV? attributed to the IV
- Nonexperimental research
o IV is not manipulated. IV can be defined, but no control over the assignment of
subjects to levels of it
o Distinction between IVs and DVs is usually arbitrary
IV = predictor variable
DV = criterion variable
o Difficult to attribute causality to an IV (relationship can be mentioned, but the
cause is unclear)
1.3 Garbage in, roses out?
Certain costs increased flexibility in research design are paralleled by increased ambiguity in
interpretation of results. Multivariate results can also be sensitive to which analytic strategy is
chosen and do not always provide better protection against statistical errors than their
univariate counterparts.
, 2. Some useful definitions
2.1 Continuous, discrete, and dichotomous data
The definitions of these three terms are as followed:
- Continuous variables: measured on a scale that changes values smoothly rather than in
steps. Can take on any value within the range o the scale. Precision is limited by the
measuring instrument, not by the scale
o Example: analog clock face, annual income, age, temperature, distance, GPA
- Discrete variables: finite and usually small number of values, no smooth transition from
one value or category to the next.
o Example: digital clock, continents, categories of religious affiliation, communities
o Can be used as if continuous if there are numerous categories and they represent
a quantitative attribute (e.g. 1 stands for 0-4 years, 2 for 5-9 years)
o Dummy coding is also possible (dichotomous)
Rank order (ordinal) scale can also be used: assigns a number to each subject to indicate the
subject’s position vis-à-vis other subjects along some dimension.
2.2 Samples and populations
Samples are measured to make generalizations about populations. Random process, ideally. In
nonexperimental research, you investigate relationships among variables in some predefined
population (define population, than sample). In experimental research, you attempt to create
different populations by treating subgroups from an originally homogenous group differently.
Sampling objective is to ensure that all subjects come from the same population before you
treat them differently.
2.3 Descriptive and inferential statistics
Descriptive statistics describe samples of subjects in terms of variables or combinations of
variables. Inferential statistics test hypotheses about differences in populations on the basis of
measurements made on samples of subjects. More restrictions on inference.
2.4 Orthogonality: standard and sequential analyses
Orthogonality is a perfect nonassociation between variables knowing the value of one
variable gives no clue as to the value of the other (r = 0). Often desirable.
- All pairs of IVs orthogonal? Each IV adds to prediction of the DV
o Use Venn Diagram!
Total variance for income = one circle
Horizontal stripes represent income predictable from education
Vertical stripes represent part predictable for occupational prestige
Circle for income overlaps education for 35%, and occupational prestige
overlaps 45%, together accounting for 80% of variability in income
, o
- When IVs are correlated, they share overlapping variance
o Major decision for multivariate analyst is how to handle variance that is
predictable from more than one variable
o Overlapping variance is disregarded in assessing the contribution of each variable
to the solution
3. Linear combinations of variables
A linear combination is one in which each variable is assigned a weight, and then the products of
weights and the variable scores are summed to predict a score on a combined variable. The
combination of variables can be considered a supervariable, not directly measured but worthy
of interpretation. May represent an underlying dimension that predicts something or optimizes
some relationship.
4. Number and nature of variables to include
A general rule is to get the best solution with the fewest variables. A second problem is
overfitting: the solution is very good; so good in fact, that it is unlikely to generalize to a
population. Occurs when too many variables are included in an analysis relative to the sample
size.
5. Statistical power
Power represents the probability that effects that actually exist have a chance of producing
statistical significance in your eventual data analysis. Best considered in planning state:
1. Estimate size of anticipated effect (e.g. an expected mean difference)
2. Variability expected in assessment of effect
3. Desired alpha level (ordinarily .05)
4. Desired power (often .80)
These four estimates are required to determine necessary sample size!
6. Data appropriate for multivariate statistics
An appropriate data set for multivariate statistical methods consists of values on a number of
variables for each of several subject. Continuous variables: values are scores on variables. For
discrete variables: values are number codes for group membership or treatment.
6.1 The data matrix
Organization of scores in which rows (lines) represent subjects and columns represent variables.
, 6.2 The correlation matrix
R = a square, symmetrical matrix. Each row and column represents a different variable, and the
value at the intersection of each row and column is the correlation between two variables.
6.3 The variance-covariance matrix
If scores are measured along a meaningful scale, it is sometimes appropriate to analyze a
variance-covariance matrix (∑). The elements in the main diagonal are the variances of each
variable, and the off-diagonal elements are covariances between pairs of different variables.
- Variance: averaged squared deviations of each score from the mean of the score
- Covariance: averaged cross-products (product of deviation between one variable and its
mean and the deviation between a second variable and its mean)
6.4 The sum-of-squares and cross-products matrix
The matrix, S, is a precursor to the variance-covariance matrix in which deviations are not yet
averaged. The size of entry depends on the number of cases as well as on the metric in which
the elements were measured. The entry in the major diagonal is S, the sum of squared
deviations of scores from the mean for that variable.
N
SS ( X i ) =∑ ¿ ¿ ¿ ¿
i=1
Where i = 1, 2 …., N
N = number of subjects
j = variable identifier
X ij = score on variable j by subject i
Xj = mean of all scores on the jth variable