ANOVA Factor Analysis Cluster Analysis
Method Dependence method → causal relationship. Interdependence method →find underlying Interdependence method → group rows,
construct. based on predefined characteristics.
Treatment variables: Nominal scale
Outcome: Metric scale Variables: should be (treatable as) metric Method can deal with all types of data.
Sample size ≥20 per cell/subgroup ≥5 per variable in the dataset (preferably 10) No strict rules, but check for outliers!
Extra Covariates (max 3) can be metric. Exploratory factor analysis: data → theory Order of magnitude should be similar for
variables, measured in comparable units.
Assumptions 1. Observations independent → each Multicollinearity is absolutely required! No ‘hard’ statistical assumptions.
observation only one combination of
treatment variables (between-subjects Method 1: Visual inspection of the correlation 1. Is the sample representative for the
design). matrix → sufficient number of high (>0.3) whole set of customers?
correlations?
2. Homoscedasticity → Levene’s Test of 2. Collinearity → not that big of an issue.
Equality of Error Variances (α > 0.05). Method 2: Bartlett’s Test of Sphericity →
reject H0 that all correlations are equal to 0 (α But keeping both variables in the analysis
3. Normality of dependent variable → Test < 0.05). implicitly place more weight on these
of Normality (α > 0.05). variables
Method 3: Measure of Sampling Adequacy
(MSA) → rule of thumb: MSA-value <0.5 is
unacceptable.
If homoscedasticity is rejected: Common variance: variance that variables Deriving the clusters:
share/have in common.
1. Check sample size → similar across Specific variance: aspect that doesn’t come Hierarchical approach:
treatment groups, don’t worry. back in other variables. • Agglomerative → starting with all objects
Error variance: errors that could cause in separate clusters, then subsequently
2. Take logarithm of dependent variable and variation in the data. adding them together until every object is
redo analysis → if it solves the problem, don’t in the same group.
worry. Option 1: Common Factor Analysis (CFA) →
how many and which groups of variables exist • Divisive → starting with one cluster, then
3. Adjust cut-off for significance: in the data (summarize data). split up until every object is a separate
• Variance in larger subsample higher → cluster.
use lower cut-off; α=0.03 or α=0.01. Option 2: Principal Component Analysis (PCA)
• Variance in larger subsample lower → → representing as much information as Non-hierarchical approach:
use higher cut-off; α=0.10. possible by a minimum number of factors • K-means → start off with a fixed number
(data reduction). of clusters.
Method Dependence method → causal relationship. Interdependence method →find underlying Interdependence method → group rows,
construct. based on predefined characteristics.
Treatment variables: Nominal scale
Outcome: Metric scale Variables: should be (treatable as) metric Method can deal with all types of data.
Sample size ≥20 per cell/subgroup ≥5 per variable in the dataset (preferably 10) No strict rules, but check for outliers!
Extra Covariates (max 3) can be metric. Exploratory factor analysis: data → theory Order of magnitude should be similar for
variables, measured in comparable units.
Assumptions 1. Observations independent → each Multicollinearity is absolutely required! No ‘hard’ statistical assumptions.
observation only one combination of
treatment variables (between-subjects Method 1: Visual inspection of the correlation 1. Is the sample representative for the
design). matrix → sufficient number of high (>0.3) whole set of customers?
correlations?
2. Homoscedasticity → Levene’s Test of 2. Collinearity → not that big of an issue.
Equality of Error Variances (α > 0.05). Method 2: Bartlett’s Test of Sphericity →
reject H0 that all correlations are equal to 0 (α But keeping both variables in the analysis
3. Normality of dependent variable → Test < 0.05). implicitly place more weight on these
of Normality (α > 0.05). variables
Method 3: Measure of Sampling Adequacy
(MSA) → rule of thumb: MSA-value <0.5 is
unacceptable.
If homoscedasticity is rejected: Common variance: variance that variables Deriving the clusters:
share/have in common.
1. Check sample size → similar across Specific variance: aspect that doesn’t come Hierarchical approach:
treatment groups, don’t worry. back in other variables. • Agglomerative → starting with all objects
Error variance: errors that could cause in separate clusters, then subsequently
2. Take logarithm of dependent variable and variation in the data. adding them together until every object is
redo analysis → if it solves the problem, don’t in the same group.
worry. Option 1: Common Factor Analysis (CFA) →
how many and which groups of variables exist • Divisive → starting with one cluster, then
3. Adjust cut-off for significance: in the data (summarize data). split up until every object is a separate
• Variance in larger subsample higher → cluster.
use lower cut-off; α=0.03 or α=0.01. Option 2: Principal Component Analysis (PCA)
• Variance in larger subsample lower → → representing as much information as Non-hierarchical approach:
use higher cut-off; α=0.10. possible by a minimum number of factors • K-means → start off with a fixed number
(data reduction). of clusters.