Methodology in marketing and strategy research
Application lectures
Topic 1: Factor analysis
1.1 Basis
It is good practice, to have an exploration of the variables separately in order to understand the
complexity that you will encounter later on if you analyze the data. Have a close look at the
univariate/bivariate statistics (dispersion, centrality, etc.) in order to understand the quality
measurement level of the variables and which ones should be analyzed. After that you can start the
factor analysis.
It is important to realize the context in which you collect data really matters for the validity and
reliability of the data measurement. Who are the respondents? Every measurement is context
sensitive.
1.2 Univariate analysis
Univariate analysis provides some interesting statistics.
Look at the amount of missing values.
o Check if the sample size is still big enough after deleting the cases with missing
values.
o If the percentage of missing values is low (<10%) we can assume representativity.
Also look at the mean, which gives you a rough idea of the results for the variable.
It is also good to check out the standard deviation, in this case this is 1.6, 1.7 and 1.8. This is
quite moderate, and they are pretty equal. If there was a big difference, this would mean
there is less consensus regarding that particular item, which might mean the question isn’t
clearly formulated. If the standard deviation were 0, or very close to 0, this would mean there
is a very high consensus.
We must check normality. We look at the kurtosis and skewness. Some deviations are not
that problematic, especially in a big sample size. We know that if skewness/kurtosis is within
the range of -3 to 3 (|3|), the deviation will not affect a valid measure of the item. We can
therefore assume normal distribution for all the items.
1.3 Exploratory actor analysis
After checking the univariate statistics, you check the correlations between the different items, by
looking at the correlation matrix, but mainly by looking at the Kaiser-Meyer-Olkin-measure (KMO)
and the Bartlett’s test. They explore to what extent the correlation matrix of the data set is deviating
from the identity matrix. The identity matrix is a matrix where you only have 1’s on the diagonal, and
0’s on the others. This means that variables correlate perfectly with themselves, but not at all with
the others.
Barlett’s test looks at whether the difference is significant.
KMO calculates to what extent there is enough interitem correlation to be analyzed. KMO of
> 0.50 is enough.
,Correlation matrix
Factor loading in each cell indicating to what extent a factor is representing a particular item
If we square the factor loadings across the different factors, we get the communalities.
Communality indicates to what extent all the factors included represent a particular item.
If the we sum all the squared loadings vertically of a particular factor, we get the eigenvalues.
To what extent does a particular factor represent the different items in the analysis.
If we sum up the eigenvalues we get the total explained variance.
Choose the extraction method. We often prefer the principal axis factoring/common factor analysis
which includes only the item variance shared by the other items. This excludes the unique item
variance, the particular part of the item not being shared by other items in the data set. This helps us
exclude an error component. By excluding the unique part we possibly exclude a part of the error
variance that is not relevant to be analyzed.
After choosing an extraction method, we get an initial factor solution.
Look at the eigenvalues and cumulative explained variance.
Look at the communalities matrix. Look at the extraction value, which ones are below the
threshold? Which items are not well presented by the factors?
o Evaluate on basis of:
Absolute, relatively low communalities
Theoretical importance (elimination affects the validity of the construct)
Methodological relevance (negative side-effect of reversal of items, number
of items)
Eliminate one by one and check the KMO and Bartlett again, and look
at the matrixes again.
After elimination we start again with the analysis. We still find one communality which is one the
edge. It is above the threshold but just nearly. Evaluate based on the stuff mentioned.
Look at the number of factors (eigenvalues > 1, percentage 60%, scree plot).
, Choose a rotation method
Orthogonal rotation
o Widely used, easiness of interpretation
o Given objective of data reduction or subsequent use in other multivariate analyses,
research contexts.
o Factors are not correlated.
Oblique rotation method
o Meaningfulness of correlated constructs for a specific context of the study.
o Factors can be correlated (rough threshold is > 0.30)
If you use oblique rotation, you get two factor loading matrixes
Structure matrix: indicates the factor loading of an item, including the factor/factor
correlation.
Pattern matrix: indicates the factor loading of an item, excluding the factor/factor
correlation. This is better interpretable.
After choosing orthogonal rotation, check the thresholds for the factor loadings ( > 0.30) and check
for double loaders. Check the wording of the questions (items). We can also check the results if we
were to choose oblique rotation.
It is good practice to have a reliability analysis after. Check the different factors and look at the
Cronbach’s Alpha (> 0.70). Also look at the item-total statistics and look at the Cronbach’s Alpha if
item deleted.
1.4 Recap exploratory factor analysis
Even random data gives factors. You therefore need to be sure you can also interpret the
meaning of these factors, otherwise they aren’t relevant. Interpretation of meaning is
subjective.
The answers you get are only as good as the data you have collected.
Adding other data can change your factor structure.
Your judgments should always be driven by theoretical considerations.
1.5 Confirmatory factor analysis
Confirmatory factor analysis is also used for PLS.
The exploratory factor model
o All the factors had linkages to all the variables in the data set. You put all the variables
in the factor analysis, and let the analysis explore what factors emerge.
o The potential number of factors for exploratory factor analysis is between 1 and the
number of observed variables.
o All observed variables are allowed to correlate with every factor.
o Rotation for interpretation reasons (because it changes the correlations between the
factors and the indicators so the pattern of values is more distinct)
o Usually more data driven.
o All observed variables are standardized in the procedure.
o The correlation matrix is analyzed.
o Errors are assumed to be uncorrelated (because we have no information about them)
The confirmatory analysis model
Application lectures
Topic 1: Factor analysis
1.1 Basis
It is good practice, to have an exploration of the variables separately in order to understand the
complexity that you will encounter later on if you analyze the data. Have a close look at the
univariate/bivariate statistics (dispersion, centrality, etc.) in order to understand the quality
measurement level of the variables and which ones should be analyzed. After that you can start the
factor analysis.
It is important to realize the context in which you collect data really matters for the validity and
reliability of the data measurement. Who are the respondents? Every measurement is context
sensitive.
1.2 Univariate analysis
Univariate analysis provides some interesting statistics.
Look at the amount of missing values.
o Check if the sample size is still big enough after deleting the cases with missing
values.
o If the percentage of missing values is low (<10%) we can assume representativity.
Also look at the mean, which gives you a rough idea of the results for the variable.
It is also good to check out the standard deviation, in this case this is 1.6, 1.7 and 1.8. This is
quite moderate, and they are pretty equal. If there was a big difference, this would mean
there is less consensus regarding that particular item, which might mean the question isn’t
clearly formulated. If the standard deviation were 0, or very close to 0, this would mean there
is a very high consensus.
We must check normality. We look at the kurtosis and skewness. Some deviations are not
that problematic, especially in a big sample size. We know that if skewness/kurtosis is within
the range of -3 to 3 (|3|), the deviation will not affect a valid measure of the item. We can
therefore assume normal distribution for all the items.
1.3 Exploratory actor analysis
After checking the univariate statistics, you check the correlations between the different items, by
looking at the correlation matrix, but mainly by looking at the Kaiser-Meyer-Olkin-measure (KMO)
and the Bartlett’s test. They explore to what extent the correlation matrix of the data set is deviating
from the identity matrix. The identity matrix is a matrix where you only have 1’s on the diagonal, and
0’s on the others. This means that variables correlate perfectly with themselves, but not at all with
the others.
Barlett’s test looks at whether the difference is significant.
KMO calculates to what extent there is enough interitem correlation to be analyzed. KMO of
> 0.50 is enough.
,Correlation matrix
Factor loading in each cell indicating to what extent a factor is representing a particular item
If we square the factor loadings across the different factors, we get the communalities.
Communality indicates to what extent all the factors included represent a particular item.
If the we sum all the squared loadings vertically of a particular factor, we get the eigenvalues.
To what extent does a particular factor represent the different items in the analysis.
If we sum up the eigenvalues we get the total explained variance.
Choose the extraction method. We often prefer the principal axis factoring/common factor analysis
which includes only the item variance shared by the other items. This excludes the unique item
variance, the particular part of the item not being shared by other items in the data set. This helps us
exclude an error component. By excluding the unique part we possibly exclude a part of the error
variance that is not relevant to be analyzed.
After choosing an extraction method, we get an initial factor solution.
Look at the eigenvalues and cumulative explained variance.
Look at the communalities matrix. Look at the extraction value, which ones are below the
threshold? Which items are not well presented by the factors?
o Evaluate on basis of:
Absolute, relatively low communalities
Theoretical importance (elimination affects the validity of the construct)
Methodological relevance (negative side-effect of reversal of items, number
of items)
Eliminate one by one and check the KMO and Bartlett again, and look
at the matrixes again.
After elimination we start again with the analysis. We still find one communality which is one the
edge. It is above the threshold but just nearly. Evaluate based on the stuff mentioned.
Look at the number of factors (eigenvalues > 1, percentage 60%, scree plot).
, Choose a rotation method
Orthogonal rotation
o Widely used, easiness of interpretation
o Given objective of data reduction or subsequent use in other multivariate analyses,
research contexts.
o Factors are not correlated.
Oblique rotation method
o Meaningfulness of correlated constructs for a specific context of the study.
o Factors can be correlated (rough threshold is > 0.30)
If you use oblique rotation, you get two factor loading matrixes
Structure matrix: indicates the factor loading of an item, including the factor/factor
correlation.
Pattern matrix: indicates the factor loading of an item, excluding the factor/factor
correlation. This is better interpretable.
After choosing orthogonal rotation, check the thresholds for the factor loadings ( > 0.30) and check
for double loaders. Check the wording of the questions (items). We can also check the results if we
were to choose oblique rotation.
It is good practice to have a reliability analysis after. Check the different factors and look at the
Cronbach’s Alpha (> 0.70). Also look at the item-total statistics and look at the Cronbach’s Alpha if
item deleted.
1.4 Recap exploratory factor analysis
Even random data gives factors. You therefore need to be sure you can also interpret the
meaning of these factors, otherwise they aren’t relevant. Interpretation of meaning is
subjective.
The answers you get are only as good as the data you have collected.
Adding other data can change your factor structure.
Your judgments should always be driven by theoretical considerations.
1.5 Confirmatory factor analysis
Confirmatory factor analysis is also used for PLS.
The exploratory factor model
o All the factors had linkages to all the variables in the data set. You put all the variables
in the factor analysis, and let the analysis explore what factors emerge.
o The potential number of factors for exploratory factor analysis is between 1 and the
number of observed variables.
o All observed variables are allowed to correlate with every factor.
o Rotation for interpretation reasons (because it changes the correlations between the
factors and the indicators so the pattern of values is more distinct)
o Usually more data driven.
o All observed variables are standardized in the procedure.
o The correlation matrix is analyzed.
o Errors are assumed to be uncorrelated (because we have no information about them)
The confirmatory analysis model