Latent variables: a variable that cannot be directly measured
Factor analysis and principal component analysis are both techniques for
identifying clusters of variables. There are three main uses:
- Understanding the structure of a set of variables
- Constructing a questionnaire to measure an underlying variable
- Reducing a date set to a more manageable size while retaining as much of
the original information as possible
They both aim to reduce a set of variables into a smaller set of dimensions
(called factors in factor analysis and components in pca).
R-matrix: table that shows the correlation between each pair of variables. The
diagonal elements of the matrix are all ones because each variable perfectly
correlate with itself. The off-diagonal elements are the correlation coefficients
between pairs of variables
Factor analysis attempts to achieve parsimony by explaining the maximum
amount of common variance in a correlation matrix using the smallest number of
explanatory constructs. These explanatory constructs are factors (or latent
variables) and they represent clusters variables that correlate highly with each
other.
PCA tries to explain the maximum amount of total variance in a correlation
matrix by transforming the original variables into linear components.
They both aim to reduce the R-matrix into a smaller set of dimensions.
In factor analysis these dimensions, or factors, are estimated from the data and
are believed to reflect constructs that can’t be measured directly
PCA transforms the data into a set of linear components, it does not estimate
unmeasured variables, it just transforms measured ones.
Both techniques look for variables that correlate highly with a group of other
variables, but do not correlate with variables outside of that group.
In PCA we predict components from the measured variables. In factor analysis we
predict the measured variables from the underlying factors.
Another difference is that unlike PCA, factor analysis contains an error term.
Graphical representation: A factor is a classification axis along which the
measures (variables) can be plotted. The greater the loading of variables on a
factor, the more that factor explains relationships between those variables.
Factor loading: the coordinate of a variable along a classification axis is known
as a factor loading
, Mathematical representation
Y b1 X 1 b2 X 2 bn X n
Factori b1Variable 1 b2Variable 2 bnVariable n
There is no intercept because the lines intersect at zero (hence the intercept is
zero)
There is no error term because it is simply about transforming the variables.
• The b values in the equation represent the weights of a variable on a
factor.
• These values are the same as the co-ordinates on a factor plot.
• They are called Factor Loadings.
Factor matrix (component matrix): a matrix in which the columns represent
each factor and the rows represent the loadings of each variable on each factor.
For example like matrix A.
0.87 0.01
0.96 0.03
0.92 0.04
A
0.00 0.82
0.10 0.75
0.09 0.70
Common factors: factors that explain the correlation between variables
Unique factors: factors that cannot explain the correlation between variables
Factor scores: a single score from an individual entity representing their
performance on some latent variable. This score is a weighted average of all the
variables, with strong weights for the variables that load strongly onto that factor
This weighted average method is a very poor method.
• Dependent on measurement scales of the variables
• Assumes that the measurement scales of all the variables are the
same