BUSINESS ANALYTICS HBS CORE
EXAM QUESTIONS AND ANSWERS.
VERIFIED 2025/2026.
A/B test - ANS An experiment that compares the value of a specified dependent variable
(such as the likelihood that a web site visitor purchases an item) across two different groups
(usually a control group and a treatment group). The members of each group must be randomly
selected to ensure that the only difference between the groups is the "manipulated"
independent variable (for example, the size of the font on two otherwise-identical web sites).
An A/B test is a hypothesis test that tests whether the means of the dependent variable are the
same across the two groups. (An A/B test can also be used to test whether another parameter,
such a standard deviation, is the same across two groups.)
adjusted R-squared - ANS A measure of the explanatory power of a regression analysis.
Adjusted R-squared is equal to R-squared multiplied by an adjustment factor that decreases
slightly as each independent variable is added to a regression model. Unlike R-squared, which
can never decrease when a new independent variable is added to a regression model, Adjusted
R-squared drops when an independent variable is added that does not improve the model's
true explanatory power. Adjusted R2 should always be used when comparing the explanatory
power of regression models that have different numbers of independent variables.
alternative hypothesis - ANS An alternative hypothesis is the theory or claim we are trying to
substantiate, and is stated as the opposite of a null hypothesis. When our data allow us to
nullify the null hypothesis, we substantiate the alternative hypothesis.
1 @COPYRIGHT 2025/2026 ALLRIGHTS RESERVED.
,asymmetric distribution - ANS A probability distribution that is not symmetric around the
mean.
average - ANS The most common statistic used to describe the center of the values in a data
set. The mean is also known as the average. For a distribution that has discrete values, the
mean is equal to sum of the values of all the data points in the set, divided by the number of
data points.
base case - ANS The category of a categorical variable for which a dummy variable is NOT
included in a regression model. A regression model with a categorical variable that has n
categories should have n-1 dummy variables. The coefficients of the dummy variables included
in the regression model are interpreted in relation to the base case. The analyst can select any
category to be excluded from the regression model; however, different base cases lead to
different interpretations of the dummy variables' coefficients. For example, suppose we are
trying to determine the average difference in height between men and women in a sample, and
suppose that on average men are 5 inches taller than women in the sample. If we use Female as
the base case then the coefficient for the dummy variable for Male would be +5. If we use Male
as the base case, the coefficient for the dummy variable for Female would be -5.
bias - ANS The tendency of a measurement process to over- or under-estimate the value of a
population parameter. Although a sample statistic will almost always differ from the population
parameter, for an unbiased sample, the difference will be random. In contrast, for a biased
sample, the statistic will differ in a systematic way (e.g., tend to be too high). Some common
reasons for bias include non-random sampling methods and non-neutral question phrasing.
biased sample - ANS A sample that is not representative of the population from which it is
collected. Sampling practices that can introduce bias include poorly phrased survey questions
and non-random sampling.
bimodal distribution - ANS A multi-modal distribution with two clearly discernable peaks. The
two peaks may be of the same height (that is, have equal frequency), or one may be the true
mode while the other has a very high (but not the highest) frequency.
2 @COPYRIGHT 2025/2026 ALLRIGHTS RESERVED.
, bin - ANS A range of values used to categorize data. In a histogram, observations are divided
into a set of non-overlapping bins, each corresponding to a range of values. The bins are
constructed to ensure that the set of bins contains all observations in the data set. The height of
the bar corresponding to a bin is equal to the number of observations in the data set that fall
within that bin's range. Typically, all bins in a given histogram are the same width (i.e., the
difference between the largest value and the smallest value is the same for each bin). In an
Excel histogram, each bin is labeled by the value of the upper boundary of the bin's range. For
example, in a histogram with three bins (each of width 1), labeled 1, 2, and 3, the bin labeled 2
contains all observations greater than 1 and less than or equal to 2. See histogram.
binomial distribution - ANS A distribution of the possible successful outcomes in a given
number of trials, where there are only two possible outcomes for each trial, and each trial has
the same probability of success (e.g., flipping a coin). For example, the binomial distribution for
the number of "heads" that result from flipping a coin 50 times specifies the probability for each
possible outcome, from observing 0 "heads" to observing 50 "heads". The binomial distribution
is used to create confidence intervals for proportions.
Central Limit Theorem - ANS A theorem stating that if we take sufficiently large randomly-
selected samples from a population, the means of these samples will be normally distributed
regardless of the shape of the underlying population. (Technically, the underlying population
must have a finite variance.)
coefficient of variation (CV) - ANS A measure of a data set's variability relative to its mean.
The coefficient of variation (CV) is particularly helpful when comparing the variability of two
data sets with different means. Calculated as the standard deviation divided by the mean, the
CV is typically expressed as a percentage. For example the CV of a data set with mean = 100
hours and standard deviation = 15 hours is 15 hours/100 hours = 15%.
conditional mean - ANS A conditional mean is the mean (average) of a subset of data. We
apply a condition and calculate the mean for values that meet that condition. For example, in a
data set that contains data on both males and females, a conditional mean might be the mean
of the data pertaining to only the females in the data set.
3 @COPYRIGHT 2025/2026 ALLRIGHTS RESERVED.
EXAM QUESTIONS AND ANSWERS.
VERIFIED 2025/2026.
A/B test - ANS An experiment that compares the value of a specified dependent variable
(such as the likelihood that a web site visitor purchases an item) across two different groups
(usually a control group and a treatment group). The members of each group must be randomly
selected to ensure that the only difference between the groups is the "manipulated"
independent variable (for example, the size of the font on two otherwise-identical web sites).
An A/B test is a hypothesis test that tests whether the means of the dependent variable are the
same across the two groups. (An A/B test can also be used to test whether another parameter,
such a standard deviation, is the same across two groups.)
adjusted R-squared - ANS A measure of the explanatory power of a regression analysis.
Adjusted R-squared is equal to R-squared multiplied by an adjustment factor that decreases
slightly as each independent variable is added to a regression model. Unlike R-squared, which
can never decrease when a new independent variable is added to a regression model, Adjusted
R-squared drops when an independent variable is added that does not improve the model's
true explanatory power. Adjusted R2 should always be used when comparing the explanatory
power of regression models that have different numbers of independent variables.
alternative hypothesis - ANS An alternative hypothesis is the theory or claim we are trying to
substantiate, and is stated as the opposite of a null hypothesis. When our data allow us to
nullify the null hypothesis, we substantiate the alternative hypothesis.
1 @COPYRIGHT 2025/2026 ALLRIGHTS RESERVED.
,asymmetric distribution - ANS A probability distribution that is not symmetric around the
mean.
average - ANS The most common statistic used to describe the center of the values in a data
set. The mean is also known as the average. For a distribution that has discrete values, the
mean is equal to sum of the values of all the data points in the set, divided by the number of
data points.
base case - ANS The category of a categorical variable for which a dummy variable is NOT
included in a regression model. A regression model with a categorical variable that has n
categories should have n-1 dummy variables. The coefficients of the dummy variables included
in the regression model are interpreted in relation to the base case. The analyst can select any
category to be excluded from the regression model; however, different base cases lead to
different interpretations of the dummy variables' coefficients. For example, suppose we are
trying to determine the average difference in height between men and women in a sample, and
suppose that on average men are 5 inches taller than women in the sample. If we use Female as
the base case then the coefficient for the dummy variable for Male would be +5. If we use Male
as the base case, the coefficient for the dummy variable for Female would be -5.
bias - ANS The tendency of a measurement process to over- or under-estimate the value of a
population parameter. Although a sample statistic will almost always differ from the population
parameter, for an unbiased sample, the difference will be random. In contrast, for a biased
sample, the statistic will differ in a systematic way (e.g., tend to be too high). Some common
reasons for bias include non-random sampling methods and non-neutral question phrasing.
biased sample - ANS A sample that is not representative of the population from which it is
collected. Sampling practices that can introduce bias include poorly phrased survey questions
and non-random sampling.
bimodal distribution - ANS A multi-modal distribution with two clearly discernable peaks. The
two peaks may be of the same height (that is, have equal frequency), or one may be the true
mode while the other has a very high (but not the highest) frequency.
2 @COPYRIGHT 2025/2026 ALLRIGHTS RESERVED.
, bin - ANS A range of values used to categorize data. In a histogram, observations are divided
into a set of non-overlapping bins, each corresponding to a range of values. The bins are
constructed to ensure that the set of bins contains all observations in the data set. The height of
the bar corresponding to a bin is equal to the number of observations in the data set that fall
within that bin's range. Typically, all bins in a given histogram are the same width (i.e., the
difference between the largest value and the smallest value is the same for each bin). In an
Excel histogram, each bin is labeled by the value of the upper boundary of the bin's range. For
example, in a histogram with three bins (each of width 1), labeled 1, 2, and 3, the bin labeled 2
contains all observations greater than 1 and less than or equal to 2. See histogram.
binomial distribution - ANS A distribution of the possible successful outcomes in a given
number of trials, where there are only two possible outcomes for each trial, and each trial has
the same probability of success (e.g., flipping a coin). For example, the binomial distribution for
the number of "heads" that result from flipping a coin 50 times specifies the probability for each
possible outcome, from observing 0 "heads" to observing 50 "heads". The binomial distribution
is used to create confidence intervals for proportions.
Central Limit Theorem - ANS A theorem stating that if we take sufficiently large randomly-
selected samples from a population, the means of these samples will be normally distributed
regardless of the shape of the underlying population. (Technically, the underlying population
must have a finite variance.)
coefficient of variation (CV) - ANS A measure of a data set's variability relative to its mean.
The coefficient of variation (CV) is particularly helpful when comparing the variability of two
data sets with different means. Calculated as the standard deviation divided by the mean, the
CV is typically expressed as a percentage. For example the CV of a data set with mean = 100
hours and standard deviation = 15 hours is 15 hours/100 hours = 15%.
conditional mean - ANS A conditional mean is the mean (average) of a subset of data. We
apply a condition and calculate the mean for values that meet that condition. For example, in a
data set that contains data on both males and females, a conditional mean might be the mean
of the data pertaining to only the females in the data set.
3 @COPYRIGHT 2025/2026 ALLRIGHTS RESERVED.