C955 Applied Probability & Statistics
Notes/Module 4: Descriptive Statistics for a
Single Variable Latest Update.
,Module 4: Descriptive Statistics for a Single Variable:
Lesson 4.02: ● Quantitative data: numerical data; represent quantities that can be counted or
Types of Data measured
● Categorical (quantitative) data: non-numerical data; data that are groups, such
as names or labels
○ Numbers can be categorical, such as area codes; numbers that can’t be add
together
Lesson 4.03: ● Distribution: An arrangement of values that illustrate their frequency or occurrence
Graphical ● Relative frequency: A way to approximate a percentage by dividing the
Displays for number of times an event occurred in an experiment by the total number of
Categorical trials
Data ● Pie chart (circle graph): A graph that shows data in categories as percentages of a
circle
○ Categorical; Best used to display different parts of a whole
○ Can be difficult for viewers to interpret if there are a number of categories
that all have similar proportions
● Bar chart: a graph that displays data that is distributed over groups or categories
○ Categorical; Best used to display count or frequencies of categories
○ Can display relative frequency and frequency distribution
● Frequency distribution: A record of the number of times data occurs within a certai
category
Module 4 Cont:
● Dot plot: Best to display the distribution of data, particularly clusters, gaps, and outliers; most useful
for smaller data sets
● Stem plot: Best to display the distribution or shape of data according to place values; keeps
information about individual data points
○ Constructed by separating each data value into a stem and a leaf; the leaf is the rightmost
digit and the stem consist of al digits to the left of the leaf
● Box plot: A graph the represents the 5 number summary; Best to display the center, spread, and
outliers of a data set
● Five number summary: lists the min, 1st quartile, median, 3rd quartile, & max in a set of data
○ Distance between each of the 5 numbers are ¼ of the data
● Histogram: Best to display the distribution (shape and spread) of quantitative data; works well with
large data sets
○ Vertical bars show the counts, or numbers, in each interval
○ Symmetric distribution: left & right halves are nearly symmetrical
■ Measure of center is the mean; measure of spread is standard deviations
○ Right (Positively) skewed distribution: long tail on the right of the graph; mean > median > mo
■ Measure of center is the median; measure of spread is the range or IQR
○ Left (Negatively) skewed distribution: long tail on the left of the graph; mean < median < mode
■ Measure of center is the median; measure of spread is the range or IQR
○ U-shaped distribution: U-shaped; has a valley in the middle
○ Uniform distribution: All bars are roughly the same height
○ Bimodal distribution: has two peaks instead of one
● Describing distributions: shape of the graph, center & spread of the data, max & min values, and any
values that could be outliers
● Mean = average; median = middle; mode = most frequently occurring
● Range: difference between the min & max of a data set; range = max - min
● Quartiles: values that divide a set into four equally sized groups; median is always Q2; quartiles are
the borders of each ¼ of the data set
● Interquartile range(IQR): the difference between the 3rd & 1st quartile; IQR = Q3 - Q1; IQR is the
spread of the middle of the data
● Outliers will be: lesser than Q1 - 1.5(IQR) & greater than Q3 + 1.5(IQR)
● Standard deviation: tells you how far , on average, the data points are from the mean
○ In a bell shaped (normal) distribution: 68% of data is within 1 deviation from the mean, 95%
Notes/Module 4: Descriptive Statistics for a
Single Variable Latest Update.
,Module 4: Descriptive Statistics for a Single Variable:
Lesson 4.02: ● Quantitative data: numerical data; represent quantities that can be counted or
Types of Data measured
● Categorical (quantitative) data: non-numerical data; data that are groups, such
as names or labels
○ Numbers can be categorical, such as area codes; numbers that can’t be add
together
Lesson 4.03: ● Distribution: An arrangement of values that illustrate their frequency or occurrence
Graphical ● Relative frequency: A way to approximate a percentage by dividing the
Displays for number of times an event occurred in an experiment by the total number of
Categorical trials
Data ● Pie chart (circle graph): A graph that shows data in categories as percentages of a
circle
○ Categorical; Best used to display different parts of a whole
○ Can be difficult for viewers to interpret if there are a number of categories
that all have similar proportions
● Bar chart: a graph that displays data that is distributed over groups or categories
○ Categorical; Best used to display count or frequencies of categories
○ Can display relative frequency and frequency distribution
● Frequency distribution: A record of the number of times data occurs within a certai
category
Module 4 Cont:
● Dot plot: Best to display the distribution of data, particularly clusters, gaps, and outliers; most useful
for smaller data sets
● Stem plot: Best to display the distribution or shape of data according to place values; keeps
information about individual data points
○ Constructed by separating each data value into a stem and a leaf; the leaf is the rightmost
digit and the stem consist of al digits to the left of the leaf
● Box plot: A graph the represents the 5 number summary; Best to display the center, spread, and
outliers of a data set
● Five number summary: lists the min, 1st quartile, median, 3rd quartile, & max in a set of data
○ Distance between each of the 5 numbers are ¼ of the data
● Histogram: Best to display the distribution (shape and spread) of quantitative data; works well with
large data sets
○ Vertical bars show the counts, or numbers, in each interval
○ Symmetric distribution: left & right halves are nearly symmetrical
■ Measure of center is the mean; measure of spread is standard deviations
○ Right (Positively) skewed distribution: long tail on the right of the graph; mean > median > mo
■ Measure of center is the median; measure of spread is the range or IQR
○ Left (Negatively) skewed distribution: long tail on the left of the graph; mean < median < mode
■ Measure of center is the median; measure of spread is the range or IQR
○ U-shaped distribution: U-shaped; has a valley in the middle
○ Uniform distribution: All bars are roughly the same height
○ Bimodal distribution: has two peaks instead of one
● Describing distributions: shape of the graph, center & spread of the data, max & min values, and any
values that could be outliers
● Mean = average; median = middle; mode = most frequently occurring
● Range: difference between the min & max of a data set; range = max - min
● Quartiles: values that divide a set into four equally sized groups; median is always Q2; quartiles are
the borders of each ¼ of the data set
● Interquartile range(IQR): the difference between the 3rd & 1st quartile; IQR = Q3 - Q1; IQR is the
spread of the middle of the data
● Outliers will be: lesser than Q1 - 1.5(IQR) & greater than Q3 + 1.5(IQR)
● Standard deviation: tells you how far , on average, the data points are from the mean
○ In a bell shaped (normal) distribution: 68% of data is within 1 deviation from the mean, 95%