MGSC 291 - HENDRIX - TEST 1 QUESTIONS
How many rows of ATM transaction data do you have? Note: This asks for the number
of observations (n), not the row number. - Answers -Highlighting the specific data in a
column will give you the count, or you can do some row math.
Check and clean your data. Did you find any problems? - Answers -Be sure you always
check! Data should be complete (not missing any values), correct (not miscoded), and
formatted correctly (numbers, text, etc.). This data is not yet ready to be analyzed if it
hasn't been carefully inspected!
How many transactions (#) occurred between 8:00 am and 10:59 am in total, across all
3 locations? (Whole number) - Answers -Sorting based on Hour (after making sure your
data is clean) is one way to solve this.
Based on the data, which of the following is a reasonable estimate for the probability
(%) that a randomly selected WITHDRAWAL amount is MORE than $85? (XX.X) -
Answers -Since you have the data, the empirical probability can be calculated as the (#
of data points that meet the criteria) / (total number of withdrawals in the data).
How many DEPOSITS (#) were made on a Wednesday? (Whole number) - Answers -
Take both the transaction type and day into account. Here, again, make sure you've
checked the original data for entry errors.
What was the average DEPOSIT amount ($) made on Sunday? (XXX.XX) - Answers -
Take both the transaction type and day into account. Here, again, make sure you've
checked the original data for entry errors.
What are the dimensions of the "wine" dataframe? - Answers -dim(wine) or nrow(wine)
and ncol(wine)
How is the taster twitter handle column named in R? - Answers -str(wine) or head(wine)
The column is called "taster twitter handle" in the .csv file, but R doesn't want object
names with spaces, other special characters, or to begin with a number. It will put a
period in place of special characters and spaces and start an object name with an "X" if
it starts with a number. Always use the str() and/or head() and/or summary() function to
look at your data in R.
Use the summary() function on the region 2 column to examine the levels of this
categorical variable. How does the Sonoma region of the Napa valley appear? -
Answers -Napa-Sonoma
What is the length of the "price" column? - Answers -length(price)
280,901 (with margin: 0)
, How many observations do not have a price? - Answers -sum(is.na(price)) or
length(price[is.na(price)]) #count the NAs[1] 22691
22,691 (with margin: 0)
What are the average points? - Answers -> mean(points)
88.1469 (with margin: 0.01)
What is the average price? - Answers -Using summary(price) reveals there is missing
data (NAs) in the price column, so the na.rm=T argument must be addted to the mean()
function to tell R to ignore them. Many functions in R ignore them automatically, but not
the mean() function.
> mean(price,na.rm=TRUE)
34.1772 (with margin: 0.01)
Considering region 2, what are the average points for the Napa-Sonoma region? -
Answers -> mean(points[region.2=="Napa-Sonoma"])
88.45203 (with margin: 0)
Considering region 2, what is the average price for the Finger Lakes region? - Answers
-Using summary(price) reveals there is missing data (NAs) in the price column, so the
na.rm=T argument must be addted to the mean() function to tell R to ignore them. Many
functions in R ignore them automatically, but not the mean() function.
> mean(price[region.2=="Finger Lakes"],na.rm=TRUE)
19.92021 (with margin: 0)
What are the average points for Chardonnay for the Sonoma region? - Answers ->
mean(points[region.2=="Sonoma"&variety=="Chardonnay"])
89.27433 (with margin: 0.01)
What type of variable is "House ID"? - Answers -Identifier
What type of variable is "Neighborhood"? - Answers -Categorical Nominal
What type of variable is "Zip Code"? - Answers -Categorical Nominal
What type of variable is "Size"? - Answers -Quantitative
What type of data is measured at one time frame (not in equally spaced time periods)? -
Answers -Cross-Sectional
If we calculated the average using the "Market Value" column, this would be a -
Answers -statistic, the database would provide a very large sample size, but not the
population of all homes.
A marketing company requested the daily sales for each day in the past year for all
locations of Zale's. - Answers -Time Series, daily sales are regular time intervals
How many rows of ATM transaction data do you have? Note: This asks for the number
of observations (n), not the row number. - Answers -Highlighting the specific data in a
column will give you the count, or you can do some row math.
Check and clean your data. Did you find any problems? - Answers -Be sure you always
check! Data should be complete (not missing any values), correct (not miscoded), and
formatted correctly (numbers, text, etc.). This data is not yet ready to be analyzed if it
hasn't been carefully inspected!
How many transactions (#) occurred between 8:00 am and 10:59 am in total, across all
3 locations? (Whole number) - Answers -Sorting based on Hour (after making sure your
data is clean) is one way to solve this.
Based on the data, which of the following is a reasonable estimate for the probability
(%) that a randomly selected WITHDRAWAL amount is MORE than $85? (XX.X) -
Answers -Since you have the data, the empirical probability can be calculated as the (#
of data points that meet the criteria) / (total number of withdrawals in the data).
How many DEPOSITS (#) were made on a Wednesday? (Whole number) - Answers -
Take both the transaction type and day into account. Here, again, make sure you've
checked the original data for entry errors.
What was the average DEPOSIT amount ($) made on Sunday? (XXX.XX) - Answers -
Take both the transaction type and day into account. Here, again, make sure you've
checked the original data for entry errors.
What are the dimensions of the "wine" dataframe? - Answers -dim(wine) or nrow(wine)
and ncol(wine)
How is the taster twitter handle column named in R? - Answers -str(wine) or head(wine)
The column is called "taster twitter handle" in the .csv file, but R doesn't want object
names with spaces, other special characters, or to begin with a number. It will put a
period in place of special characters and spaces and start an object name with an "X" if
it starts with a number. Always use the str() and/or head() and/or summary() function to
look at your data in R.
Use the summary() function on the region 2 column to examine the levels of this
categorical variable. How does the Sonoma region of the Napa valley appear? -
Answers -Napa-Sonoma
What is the length of the "price" column? - Answers -length(price)
280,901 (with margin: 0)
, How many observations do not have a price? - Answers -sum(is.na(price)) or
length(price[is.na(price)]) #count the NAs[1] 22691
22,691 (with margin: 0)
What are the average points? - Answers -> mean(points)
88.1469 (with margin: 0.01)
What is the average price? - Answers -Using summary(price) reveals there is missing
data (NAs) in the price column, so the na.rm=T argument must be addted to the mean()
function to tell R to ignore them. Many functions in R ignore them automatically, but not
the mean() function.
> mean(price,na.rm=TRUE)
34.1772 (with margin: 0.01)
Considering region 2, what are the average points for the Napa-Sonoma region? -
Answers -> mean(points[region.2=="Napa-Sonoma"])
88.45203 (with margin: 0)
Considering region 2, what is the average price for the Finger Lakes region? - Answers
-Using summary(price) reveals there is missing data (NAs) in the price column, so the
na.rm=T argument must be addted to the mean() function to tell R to ignore them. Many
functions in R ignore them automatically, but not the mean() function.
> mean(price[region.2=="Finger Lakes"],na.rm=TRUE)
19.92021 (with margin: 0)
What are the average points for Chardonnay for the Sonoma region? - Answers ->
mean(points[region.2=="Sonoma"&variety=="Chardonnay"])
89.27433 (with margin: 0.01)
What type of variable is "House ID"? - Answers -Identifier
What type of variable is "Neighborhood"? - Answers -Categorical Nominal
What type of variable is "Zip Code"? - Answers -Categorical Nominal
What type of variable is "Size"? - Answers -Quantitative
What type of data is measured at one time frame (not in equally spaced time periods)? -
Answers -Cross-Sectional
If we calculated the average using the "Market Value" column, this would be a -
Answers -statistic, the database would provide a very large sample size, but not the
population of all homes.
A marketing company requested the daily sales for each day in the past year for all
locations of Zale's. - Answers -Time Series, daily sales are regular time intervals