(actual questions and explained answers +key terms)
University of Florida
Module 1
• Classify variables as either categorical or quantitative.
• Determine if data is cross-sectional or time series.
• Make, interpret, and describe histograms.
• Compute and describe measures of center. Be able to determine which
measures are more appropriate.
• Compute and describe measures of spread. Be able to determine which
measures are more appropriate.
• Compute z-scores and use them to compare observations.
• Make, interpret, and describe boxplots.
• Be able to compare distributions using boxplots and histograms.
• Brief Review of the Normal Distribution
Statistics is the field of making decisions based on data.
Business Analytics is using statistical methods to inform business practices
Have prices of houses in Gainesville changed
due to COVID?
Question:
Each semester each student in the class collects data on one house for sale for a
specified city in Florida. We will examine data from August 2019 and August 2020
to determine if house prices have increased due to COVID.
Data:
• The data was collected by students in QMB 3250 and 5304 during the Fall
terms.
• The data was gathered from Realtor.com.
, EAST_WES NORTH_SOUTH BEDS + BATHS SQUARE PRICE Time
T FOOTAGE
West North 8 3550 963450.00 A2020
West North 10.5 4083 675000.00 A2020
West South 7 3000 589000.00 A2020
West North 9 2144 550000.00 A2020
West North 7 2935 499000.00 A2020
• The data set is posted in Canvas for this module for your convenience. It is
called GNV_Housing_Aug2019_2020.
Relational Databases
• Often business data is stored in a relational database. For example, the Gainesville database might
include another column of information that included the real estate agent. That column may then
link with another database that provides information on these agents.
Example 1 (cont.) Here is the data set on houses in Gainesville, FL, with an
additional column added.
NORTH_SOUT BEDS + BATHS SQUARE PRICE Time Real
EAST_WEST H FOOTAG Estate
E Agent
Number
West North 8 3550 963450.0 A2020 0922
0
West North 10.5 4083 675000.0 A2020 7877
0
West South 7 3000 589000.0 A2020 0922
0
West North 9 2144 550000.0 A2020 7877
0
West North 7 2935 499000.0 A2020 7888
0
Real Estate Agents
Company Street Contact Real Estate Email Address
Address Person Agent Number
Freeman 101 Pine Y. Yaez 0922
Realty Ave; m
Gainesville,
FL
Coldwater 223 Oak J. Jans 07877 jjans@diamonds.c
Realty Street; om
Gainesville,
FL
,For our purposes, we will assume that the data has already been pulled from a
relational database. We will then explore the resulting spreadsheet.
• When was the data collected?
, • Where was the data collected?
• What does each case represent?
• What was recorded for each case?
When you look at a set of data to analyze, you should answer the following questions.
• Why are you doing the analysis?
• Whom or what does each row of data refer to?
• What variables are recorded?
When doing statistics, you need to know the type of data being collected to perform the correct type of
analysis.
Types of Data
• Identifiers
o The only purpose is to give more information about the case
▪ Ex.
• Categorical Data
o When data is put into groups
▪ Ex.
o We usually talk about percentages and/or proportions
o If categories have an intrinsic order, they are considered ordinal. (Identified in JMP with
an icon that is a green bar chart.)
• Ex.
▪ If categories don't have an intrinsic order, they are nominal. (Identified in JMP
with an icon that is a red bar chart.)
• Ex.
• Quantitative Data
o When data is numerical amounts with units
o We can take averages to get these quantities.
o (Identified in JMP with an icon that is a blue histogram.)
• Data can be cross-sectional, which means that all the data is collected and measured
simultaneously, or data can be in a time series which means that the data was collected over
Time.
Example 1 (revisited): Go back to the Saratoga data.