BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
What does quantitative data use? - (answer) means
What is discrete data? - (answer) Quantitative. Measured in specific values
What is continuous data? - (answer) Quantitative. Measure in infinite values
What does qualitative data use? - (answer) Proportions
What is ordinal data? - (answer) Qualitative. Conveys a ranking
What is nominal data? - (answer) Qualitative. Uses labels (no ranking)
confidence interval for one proportion - (answer) q-hat = (1-p-hat)
hypothesis test for one proportion - (answer)
Confidence interval for the difference in two proportions - (answer)
sample size for estimating mean - (answer) E = B (on formula sheet)
standard normal transformation formula - (answer) calculating when Z is unknown (or x or sd but most
likely z)
right skewed distribution - (answer) mean > median, also known as a positive skew
left skewed distribution - (answer) mean < median, also known as a negative skew
90% confidence interval - (answer) 1.645
,BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
95% confidence interval - (answer) 1.96
99% confidence interval - (answer) 2.575
90% confidence interval of the proportion of all market goers who are students would be: - (answer)
Narrower than the 95% confidence interval
When do you reject the null hypothesis? - (answer) When the p value is less than 0.05 (p-value low,
reject that SHO!!!) 5% level of significance
What does the t-value represent? - (answer) the sample means is (x) amount standard errors more/less
(depending if it is positive or minus) than the hypothesised mean
What is a type one error? - (answer) Rejecting the null hypothesis when it is true
What is a type two error? - (answer) failing to reject a false null hypothesis
p-value - (answer) -is calculated based on the assumption that the null hypothesis is true
-the p-value is sample specific, meaning if you collected another random sample of the same size from
the same population, the p-value would likely be different.
Stratifed random sample - (answer) For the same sample size, parameter estimates are usually more
accurate than for simple random sampling
Categorical variables (qualitative variables) - (answer) those that divide subjects into groups, but do not
allow any sort of mathematical operations to be performed on the data
Numerical Variables (Quantitative) - (answer) numbers
,BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
Ordinal - (answer) rank, order
nominal variables - (answer) variables measured in monetary units....currency
Data quality and feature selection - (answer)
input variable - (answer) xi
- explanatory variable (independent) variables these are also called features
output variable - (answer) Yk
- the response (dependent) variables
issues with data - (answer) - data you tend to play with (number you're given in a sample)
- features and examples (more of one less of the other)
- large number of example of features
- very large number of examples
data quality - (answer) - most often comes as a table - but this isn't the case (in the real world)
- data can have holes and look and it won't look clean (complex format)
poor data quality - (answer) -missing column variables
-missing values
- errors in the data entry
- mixed numeric and test
- inconsistent values
, BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
Transformations (scaling) - (answer) - reduce the scale (range) of the data
- transform the mean, 0, and SD, 1
Log transform - (answer) - some data is highly skewed and is better if you change it to look more
normally distributed
- square root helps
Feature selection reduction - (answer) worst case, more explanatory variables than examples, so a
multiple linear regression cannot be constructed
- simple method
1. select the feature F that is most correlated with the response
2. remove all features that are highly correlated w/ F (above some given time hold)
3. repeat steps 1 and 2 until number of features is manageable
Correlation - (answer) Correlation is NOT causation
Cor X,Y/óxóy
stepwise multiple regression model - (answer) mulitiple factors the impact the regression (mulitiple
dimension model)
backward multiple regression model - (answer) - build regression
- price worst variable
- take variable out confirmed adjusted r-squared hasn't gone down (quantity and fit of the model)
Principal component analysis - (answer) - very common method for dimensionaity reduction (i.e.
reduce the number of features)
- rotates the data
map more complicated equations though transforming the data
ANSWERS
What does quantitative data use? - (answer) means
What is discrete data? - (answer) Quantitative. Measured in specific values
What is continuous data? - (answer) Quantitative. Measure in infinite values
What does qualitative data use? - (answer) Proportions
What is ordinal data? - (answer) Qualitative. Conveys a ranking
What is nominal data? - (answer) Qualitative. Uses labels (no ranking)
confidence interval for one proportion - (answer) q-hat = (1-p-hat)
hypothesis test for one proportion - (answer)
Confidence interval for the difference in two proportions - (answer)
sample size for estimating mean - (answer) E = B (on formula sheet)
standard normal transformation formula - (answer) calculating when Z is unknown (or x or sd but most
likely z)
right skewed distribution - (answer) mean > median, also known as a positive skew
left skewed distribution - (answer) mean < median, also known as a negative skew
90% confidence interval - (answer) 1.645
,BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
95% confidence interval - (answer) 1.96
99% confidence interval - (answer) 2.575
90% confidence interval of the proportion of all market goers who are students would be: - (answer)
Narrower than the 95% confidence interval
When do you reject the null hypothesis? - (answer) When the p value is less than 0.05 (p-value low,
reject that SHO!!!) 5% level of significance
What does the t-value represent? - (answer) the sample means is (x) amount standard errors more/less
(depending if it is positive or minus) than the hypothesised mean
What is a type one error? - (answer) Rejecting the null hypothesis when it is true
What is a type two error? - (answer) failing to reject a false null hypothesis
p-value - (answer) -is calculated based on the assumption that the null hypothesis is true
-the p-value is sample specific, meaning if you collected another random sample of the same size from
the same population, the p-value would likely be different.
Stratifed random sample - (answer) For the same sample size, parameter estimates are usually more
accurate than for simple random sampling
Categorical variables (qualitative variables) - (answer) those that divide subjects into groups, but do not
allow any sort of mathematical operations to be performed on the data
Numerical Variables (Quantitative) - (answer) numbers
,BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
Ordinal - (answer) rank, order
nominal variables - (answer) variables measured in monetary units....currency
Data quality and feature selection - (answer)
input variable - (answer) xi
- explanatory variable (independent) variables these are also called features
output variable - (answer) Yk
- the response (dependent) variables
issues with data - (answer) - data you tend to play with (number you're given in a sample)
- features and examples (more of one less of the other)
- large number of example of features
- very large number of examples
data quality - (answer) - most often comes as a table - but this isn't the case (in the real world)
- data can have holes and look and it won't look clean (complex format)
poor data quality - (answer) -missing column variables
-missing values
- errors in the data entry
- mixed numeric and test
- inconsistent values
, BSNS 112 FINAL EXAM NEWEST 2024-2025 ACTUAL EXAM COMPLETE 100 QUESTIONS AND CORRECT
ANSWERS
Transformations (scaling) - (answer) - reduce the scale (range) of the data
- transform the mean, 0, and SD, 1
Log transform - (answer) - some data is highly skewed and is better if you change it to look more
normally distributed
- square root helps
Feature selection reduction - (answer) worst case, more explanatory variables than examples, so a
multiple linear regression cannot be constructed
- simple method
1. select the feature F that is most correlated with the response
2. remove all features that are highly correlated w/ F (above some given time hold)
3. repeat steps 1 and 2 until number of features is manageable
Correlation - (answer) Correlation is NOT causation
Cor X,Y/óxóy
stepwise multiple regression model - (answer) mulitiple factors the impact the regression (mulitiple
dimension model)
backward multiple regression model - (answer) - build regression
- price worst variable
- take variable out confirmed adjusted r-squared hasn't gone down (quantity and fit of the model)
Principal component analysis - (answer) - very common method for dimensionaity reduction (i.e.
reduce the number of features)
- rotates the data
map more complicated equations though transforming the data