Business Analytics HBS CORe Updated
2025 Edition. Questions & Correct
Answers. Graded A
A/B test - ANS An experiment that compares the value of a specified
dependent variable (such as the likelihood that a web site visitor purchases
an item) across two different groups (usually a control group and a
treatment group). The members of each group must be randomly selected
to ensure that the only difference between the groups is the "manipulated"
independent variable (for example, the size of the font on two otherwise-
identical web sites). An A/B test is a hypothesis test that tests whether the
means of the dependent variable are the same across the two groups. (An
A/B test can also be used to test whether another parameter, such a
standard deviation, is the same across two groups.)
adjusted R-squared - ANS A measure of the explanatory power of a
regression analysis. Adjusted R-squared is equal to R-squared multiplied
by an adjustment factor that decreases slightly as each independent
variable is added to a regression model. Unlike R-squared, which can
never decrease when a new independent variable is added to a regression
model, Adjusted R-squared drops when an independent variable is added
that does not improve the model's true explanatory power. Adjusted R2
should always be used when comparing the explanatory power of
regression models that have different numbers of independent variables.
1
, 2
alternative hypothesis - ANS An alternative hypothesis is the theory or
claim we are trying to substantiate, and is stated as the opposite of a null
hypothesis. When our data allow us to nullify the null hypothesis, we
substantiate the alternative hypothesis.
asymmetric distribution - ANS A probability distribution that is not
symmetric around the mean.
average - ANS The most common statistic used to describe the center of
the values in a data set. The mean is also known as the average. For a
distribution that has discrete values, the mean is equal to sum of the values
of all the data points in the set, divided by the number of data points.
base case - ANS The category of a categorical variable for which a dummy
variable is NOT included in a regression model. A regression model with a
categorical variable that has n categories should have n-1 dummy
variables. The coefficients of the dummy variables included in the
regression model are interpreted in relation to the base case. The analyst
can select any category to be excluded from the regression model;
however, different base cases lead to different interpretations of the dummy
variables' coefficients. For example, suppose we are trying to determine the
average difference in height between men and women in a sample, and
suppose that on average men are 5 inches taller than women in the
sample. If we use Female as the base case then the coefficient for the
dummy variable for Male would be +5. If we use Male as the base case, the
coefficient for the dummy variable for Female would be -5.
2
, 3
bias - ANS The tendency of a measurement process to over- or under-
estimate the value of a population parameter. Although a sample statistic
will almost always differ from the population parameter, for an unbiased
sample, the difference will be random. In contrast, for a biased sample, the
statistic will differ in a systematic way (e.g., tend to be too high). Some
common reasons for bias include non-random sampling methods and non-
neutral question phrasing.
biased sample - ANS A sample that is not representative of the population
from which it is collected. Sampling practices that can introduce bias
include poorly phrased survey questions and non-random sampling.
bimodal distribution - ANSA multi-modal distribution with two clearly
discernable peaks. The two peaks may be of the same height (that is, have
equal frequency), or one may be the true mode while the other has a very
high (but not the highest) frequency.
bin - ANSA range of values used to categorize data. In a histogram,
observations are divided into a set of non-overlapping bins, each
corresponding to a range of values. The bins are constructed to ensure that
the set of bins contains all observations in the data set. The height of the
bar corresponding to a bin is equal to the number of observations in the
data set that fall within that bin's range. Typically, all bins in a given
histogram are the same width (i.e., the difference between the largest value
and the smallest value is the same for each bin). In an Excel histogram,
each bin is labeled by the value of the upper boundary of the bin's range.
For example, in a histogram with three bins (each of width 1), labeled 1, 2,
3