Answers|Latest Update
Histogram Histogram, also called a bar chart.
A histogram's x-axis represents bins corresponding to ranges of data; its y-axis indicates the
frequency of observations falling into each bin.
The best bin size depends on what we are trying to learn from the data.
Using larger bins can simplify a histogram, but may make it difficult to see trends in the data.
Very small bins can have such low frequencies that make it difficult to discern patterns.
Descriptive Statistic We call such summary measures "descriptive statistics"—they
provide a quick overview of a data set without showing every data point. We encounter these
kinds of descriptive statistics every day: we talk about a baseball player's performance by
referring to his batting average; we measure stock market performance using the Dow Jones
Industrial Average or the Nikkei Index; we summarize our academic performance using our
grade point average.
Central Tendency "central tendency" of a data set—an indication of where the "center" of
the data set lies.
Mean Mean (average): The mean is equal to the sum of all of the data points in a set,
divided by n, the number of data points.
, HBX Business Analytics Questions And
Answers|Latest Update
We represent the mean by x¯ when we calculate the mean of a sample and by μ when we
calculate the mean of a population. In the following definitions, n is the number of data points in
a sample and N is the number of data points in a population.
x¯=∑i=1nxin=x1+x2+...+xnn and μ=∑i=1NxiN=x1+x2+...+xNN
For the data set {0.5, 0.5, 1.5, 3.0, 4.0}, x¯=(0.5+0.5+1.5+3.0+4.0)5=9.55=1.9
Median Median: The middle value of a data set. The same number of data points fall
above and below the median. To find the median, first arrange the values in order of magnitude.
If the total number of data points is odd, the median is the value that lies in the middle. If the
total number is even, the median is the average of the two middle values.
The median of the data set {0.5, 0.5, 1.5, 3.0, 4.0} is 1.5.
The median of the data set {0.5, 0.5, 1.5, 3.0} is the average of 0.5 and 1.5, which is 1.0.
Mode Mode: The value that occurs most frequently in a data set.
The mode of the data set {0.5, 0.5, 1.5, 3.0, 4.0} is 0.5.
multimodal If a data set has more than one value with the highest frequency, that data set
has more than one mode. A distribution is called bimodal if it has two clearly defined peaks (two
points with very high frequency). The two peaks may have equal frequency and hence be true
modes, or one peak may be a mode and the other peak may simply have a very high (but not the
highest) frequency. Distributions with multiple peaks are called multimodal.