Dennis Klappe
February 24, 2025
General
Data Types and Examples
• Categorical Data: Represents distinct groups; non-numeric.
– Example: Browser type (e.g., Chrome, Firefox)
• Nominal Data: Categories without a meaningful order.
– Example: Types of cuisine (e.g., Italian, Chinese, Mexican)
– Nominal Scale: Size of number is not related to the amount of the characteristic measured.
– Example: Eye color.
• Ordinal Data: Ordered categories; unequal intervals.
– Example: Education level (e.g., Bachelor’s, Master’s)
– Ordinal Scale: Larger numbers indicate more (or less) of the characteristic, but not how much
more (or less).
– Example: Military ranks, ranking of teams in a tournament.
• Continuous Data: Numeric values within a range; measured.
– Example: Height (e.g., 175 cm)
• Ratio Scale: Contains interval properties, with a natural zero point, allowing interpretation of ratios.
– Example: Height.
• Binary Data: Two possible outcomes.
– Example: Pass/Fail
• Discrete Data: Countable numeric values; distinct.
– Example: Number of children (0, 1, 2, ...)
• Interval Scale: Contains ordinal properties, with equal differences between scale points.
– Example: Temperature [°C].
Topics Overview
• A/B Testing: Used to compare two versions (A vs. B) to determine which performs better on specific
metrics.
– Example: Comparing click-through rates between two webpage designs.
• ANOVA (Analysis of Variance): Tests for significant differences between means across 3+ groups.
– Example: Examining differences in mean test scores across multiple teaching methods.
• Regression Analysis: Models relationships between a dependent variable and one or more indepen-
dent variables.
– Example: Predicting house prices based on size, location, and age.
• Cluster Analysis: Groups observations with similar characteristics into homogeneous groups.
– Example: Segmenting customers into distinct profiles based on purchasing behavior.
• Conjoint Analysis: Determines how consumers value different attributes of a product.
– Example: Evaluating trade-offs consumers make between price and features in smartphones.
A/B Testing
Core Concept
A/B testing compares two variants (A vs. B) to determine which performs better on specific metrics.
Use it when:
1
, • Testing UI changes, pricing strategies, or marketing campaigns
• Evaluating algorithm changes or feature rollouts
• Needing statistical confidence before full implementation
Statistical Test Selection Guide
• Categorical Data (e.g., conversion rates):
– Small samples: Fisher’s Exact Test
– Large samples: Pearson’s Chi-Square Test
• Continuous Data (e.g., revenue):
– Normal distribution/large samples:
∗ Equal variances: Student’s t-test
∗ Unequal variances: Welch’s t-test
– Non-normal/small samples: Mann-Whitney U Test
Key Considerations
• Use Levene’s Test to determine equal vs. unequal variances:
– p-value > 0.05: Equal variances, use Student’s t-test
– p-value < 0.05: Unequal variances, use Welch’s t-test
• Welch’s t-test is generally preferred over Student’s t-test (more robust to unequal variances)
• Mann-Whitney tests distributions/medians, not means
• Large samples (> 30) can use t-tests via CLT even with non-normal data
• For skewed data, consider transformations or median-based analysis
Decision Tree
1. Categorical outcome?
• Small sample → Fisher’s Exact
• Large sample → Chi-Square
2. Continuous outcome?
• Normal/large sample:
– Equal variances → Student’s t-test
– Unequal variances → Welch’s t-test
• Non-normal/small sample → Mann-Whitney
Analysis of (Co-)Variance
Core Concept
• ANOVA: Tests differences between 3+ group means
• ANCOVA: ANOVA with continuous control variables (covariates)
• Sum of Squares (SS):
– SS Total: Total variance in the data.
Interpretation: High SS Total indicates high overall variability in your data.
– SS Between: Variance due to differences between group means.
Interpretation: High SS Between means greater differences between groups, potentially significant
if the F-statistic is also high.
– SS Within: Variance within groups, capturing unaccounted variance.
Interpretation: High SS Within suggests more variability within each group, which can reduce the
likelihood of finding significant between-group effects.
– SS Covariate (ANCOVA): Variance explained by covariates.
Interpretation: High SS Covariate indicates that covariates account for a significant portion of the
variance, improving the analysis by controlling for confounders.
• Key output: F-statistic F = Between-group variance
Within-group variance
– Interpretation:
∗ High F-value: Greater between-group variance compared to within-group variance, indicating
significant differences.
2