BSNS112 - Final Exam prep Exam Prep Questions And Answers (202472025)
What does quantitative data use? - ✔✔means
What is discrete data? - ✔✔Quantitative. Measured in specific values
What is continuous data? - ✔✔Quantitative. Measure in infinite values
What does qualitative data use? - ✔✔Proportions
What is ordinal data? - ✔✔Qualitative. Conveys a ranking
What is nominal data? - ✔✔Qualitative. Uses labels (no ranking)
confidence interval for one proportion - ✔✔q-hat = (1-p-hat)
hypothesis test for one proportion - ✔✔
Confidence interval for the difference in two proportions - ✔✔
sample size for estimating mean - ✔✔E = B (on formula sheet)
standard normal transformation formula - ✔✔calculating when Z is unknown (or x or sd but
most likely z)
right skewed distribution - ✔✔mean > median, also known as a positive skew
left skewed distribution - ✔✔mean < median, also known as a negative skew
,90% confidence interval - ✔✔1.645
95% confidence interval - ✔✔1.96
99% confidence interval - ✔✔2.575
90% confidence interval of the proportion of all market goers who are students would be: -
✔✔Narrower than the 95% confidence interval
When do you reject the null hypothesis? - ✔✔When the p value is less than 0.05 (p-value low,
reject that SHO!!!) 5% level of significance
What does the t-value represent? - ✔✔the sample means is (x) amount standard errors more/less
(depending if it is positive or minus) than the hypothesised mean
What is a type one error? - ✔✔Rejecting the null hypothesis when it is true
What is a type two error? - ✔✔failing to reject a false null hypothesis
p-value - ✔✔-is calculated based on the assumption that the null hypothesis is true
-the p-value is sample specific, meaning if you collected another random sample of the same size
from the same population, the p-value would likely be different.
Stratifed random sample - ✔✔For the same sample size, parameter estimates are usually more
accurate than for simple random sampling
Categorical variables (qualitative variables) - ✔✔those that divide subjects into groups, but do
not allow any sort of mathematical operations to be performed on the data
Numerical Variables (Quantitative) - ✔✔numbers
,Ordinal - ✔✔rank, order
nominal variables - ✔✔variables measured in monetary units....currency
Data quality and feature selection - ✔✔
input variable - ✔✔xi
- explanatory variable (independent) variables these are also called features
output variable - ✔✔Yk
- the response (dependent) variables
issues with data - ✔✔- data you tend to play with (number you're given in a sample)
- features and examples (more of one less of the other)
- large number of example of features
- very large number of examples
data quality - ✔✔- most often comes as a table - but this isn't the case (in the real world)
- data can have holes and look and it won't look clean (complex format)
poor data quality - ✔✔-missing column variables
-missing values
- errors in the data entry
- mixed numeric and test
- inconsistent values
Transformations (scaling) - ✔✔- reduce the scale (range) of the data
- transform the mean, 0, and SD, 1
, Log transform - ✔✔- some data is highly skewed and is better if you change it to look more
normally distributed
- square root helps
Feature selection reduction - ✔✔worst case, more explanatory variables than examples, so a
multiple linear regression cannot be constructed
- simple method
1. select the feature F that is most correlated with the response
2. remove all features that are highly correlated w/ F (above some given time hold)
3. repeat steps 1 and 2 until number of features is manageable
Correlation - ✔✔Correlation is NOT causation
Cor X,Y/óxóy
stepwise multiple regression model - ✔✔mulitiple factors the impact the regression (mulitiple
dimension model)
backward multiple regression model - ✔✔- build regression
- price worst variable
- take variable out confirmed adjusted r-squared hasn't gone down (quantity and fit of the model)
Principal component analysis - ✔✔- very common method for dimensionaity reduction (i.e.
reduce the number of features)
- rotates the data
map more complicated equations though transforming the data
- non-linear models - ✔✔straight line sometimes not best fit
- take the x-value and make new deviratives
Y-hat = Bo+ B1 X + B2X^2 + B3X^3
What does quantitative data use? - ✔✔means
What is discrete data? - ✔✔Quantitative. Measured in specific values
What is continuous data? - ✔✔Quantitative. Measure in infinite values
What does qualitative data use? - ✔✔Proportions
What is ordinal data? - ✔✔Qualitative. Conveys a ranking
What is nominal data? - ✔✔Qualitative. Uses labels (no ranking)
confidence interval for one proportion - ✔✔q-hat = (1-p-hat)
hypothesis test for one proportion - ✔✔
Confidence interval for the difference in two proportions - ✔✔
sample size for estimating mean - ✔✔E = B (on formula sheet)
standard normal transformation formula - ✔✔calculating when Z is unknown (or x or sd but
most likely z)
right skewed distribution - ✔✔mean > median, also known as a positive skew
left skewed distribution - ✔✔mean < median, also known as a negative skew
,90% confidence interval - ✔✔1.645
95% confidence interval - ✔✔1.96
99% confidence interval - ✔✔2.575
90% confidence interval of the proportion of all market goers who are students would be: -
✔✔Narrower than the 95% confidence interval
When do you reject the null hypothesis? - ✔✔When the p value is less than 0.05 (p-value low,
reject that SHO!!!) 5% level of significance
What does the t-value represent? - ✔✔the sample means is (x) amount standard errors more/less
(depending if it is positive or minus) than the hypothesised mean
What is a type one error? - ✔✔Rejecting the null hypothesis when it is true
What is a type two error? - ✔✔failing to reject a false null hypothesis
p-value - ✔✔-is calculated based on the assumption that the null hypothesis is true
-the p-value is sample specific, meaning if you collected another random sample of the same size
from the same population, the p-value would likely be different.
Stratifed random sample - ✔✔For the same sample size, parameter estimates are usually more
accurate than for simple random sampling
Categorical variables (qualitative variables) - ✔✔those that divide subjects into groups, but do
not allow any sort of mathematical operations to be performed on the data
Numerical Variables (Quantitative) - ✔✔numbers
,Ordinal - ✔✔rank, order
nominal variables - ✔✔variables measured in monetary units....currency
Data quality and feature selection - ✔✔
input variable - ✔✔xi
- explanatory variable (independent) variables these are also called features
output variable - ✔✔Yk
- the response (dependent) variables
issues with data - ✔✔- data you tend to play with (number you're given in a sample)
- features and examples (more of one less of the other)
- large number of example of features
- very large number of examples
data quality - ✔✔- most often comes as a table - but this isn't the case (in the real world)
- data can have holes and look and it won't look clean (complex format)
poor data quality - ✔✔-missing column variables
-missing values
- errors in the data entry
- mixed numeric and test
- inconsistent values
Transformations (scaling) - ✔✔- reduce the scale (range) of the data
- transform the mean, 0, and SD, 1
, Log transform - ✔✔- some data is highly skewed and is better if you change it to look more
normally distributed
- square root helps
Feature selection reduction - ✔✔worst case, more explanatory variables than examples, so a
multiple linear regression cannot be constructed
- simple method
1. select the feature F that is most correlated with the response
2. remove all features that are highly correlated w/ F (above some given time hold)
3. repeat steps 1 and 2 until number of features is manageable
Correlation - ✔✔Correlation is NOT causation
Cor X,Y/óxóy
stepwise multiple regression model - ✔✔mulitiple factors the impact the regression (mulitiple
dimension model)
backward multiple regression model - ✔✔- build regression
- price worst variable
- take variable out confirmed adjusted r-squared hasn't gone down (quantity and fit of the model)
Principal component analysis - ✔✔- very common method for dimensionaity reduction (i.e.
reduce the number of features)
- rotates the data
map more complicated equations though transforming the data
- non-linear models - ✔✔straight line sometimes not best fit
- take the x-value and make new deviratives
Y-hat = Bo+ B1 X + B2X^2 + B3X^3