QMB3302 Final Exam 1
Which of the following is a common use case for the random forest algorithm in
machine learning?
a. Classifying data into categories based on input features
b. Predicting a continuous target variable based on input features.
c. Clustering similar data points into groups.
d. Finding the optimal hyper parameters for a model. - answera
Which of the following is a potential benefit of using decision trees in machine learning?
a. Easy to overfit the data
b. Great at predicting future data
c. Can handle both numerical and categorical data.
d. Can only handle numerical data - answerc
Which of the following statements best describes an ensemble method in machine
learning?
a. A technique that combines the results of multiple models to improve overall predictive
accuracy,
b. An algorithm that learns to find patterns and relationships in data without being
explicitly programmed,
c. A method that automatically groups similar data points into clusters based on their
characteristics.
d. A model that predicts the value of a dependent variable based on the values of one
or more independent variables. - answera
Which of the following best describes supervised learning?
a. A machine learning approach where the algorithm learns to optimize a performance
metric by adjusting its internal parameters.
b. A machine learning approach where the algorithm automatically groups similar data
points into clusters.
c. A machines learning approach where the algorithm receives labeled data and learns
to map inputs to outputs based on those labels.
d. A machine learning approach where the algorithm learns to find patterns and
relationships in data without being explicitly programmed. - answerc
Which of the following statements best describes classification in machine learning?
a. A type of supervised learning where the goal is to predict a continuous target based
on input features.
b. A type of reinforcement learning where the goal is to learn an optimal policy for
making decisions in an environment.
c. A type of unsupervised learning where the goal is to group similar data points into
clusters.
, d. A type of supervised learning where the goal is to assign input data points to
predefined categories or classes. - answerd
We want the R-squared value for our regression model to be 100%.
a. True
b. Fales - answerb
One weakness of cross-validation discussed is that information can sometimes
_______ across different periods. A common situation in which this happens is when we
are looking at stock data.
a. Leak
b. Overfit
c. Not leak
d. Underfit - answera
In which of these situations would you want to use a clustering algorithm?
a. You were given the financial data for the Federal Reserve of New York in 2023 and
want to determine where the discrepancy in accumulated depreciation came from
before you submit the financial statements.
b. You have a dataset containing past crimes of current defendants and you want to
determine the likelihood that they will commit another crime.
c. You have a dataset containing 2023 Charlotte, NC housing data and you want to
predict 2024 housing prices.
d. You have a dataset containing set for Cheesecake Factory and you want to look at
customer spending at the restaurant in order to find patterns among customers who
share similar characteristics. - answerd
What is a potential downside of using linear regression models in machine learning?
a. They are not suitable for predicting continuous target variables.
b. They are too complex and difficult to interpret.
c. They are prone to overfitting the data.
d. They can only handle numerical data. - answerc
What type of algorithm would you use to segment customers into groups?
Assume the groups ARE already labeled.
a. Decision trees.
b. Cluster regression
c. Random forest
d. Regression
e. All of the above. - answere
Which of the following is true about data validation and cross-validation in machine
learning?
a. Data validation and cross-validation are used to evaluate a model's performance and
prevent overfitting.
b. Data validation and cross-validation are the same thing.
Which of the following is a common use case for the random forest algorithm in
machine learning?
a. Classifying data into categories based on input features
b. Predicting a continuous target variable based on input features.
c. Clustering similar data points into groups.
d. Finding the optimal hyper parameters for a model. - answera
Which of the following is a potential benefit of using decision trees in machine learning?
a. Easy to overfit the data
b. Great at predicting future data
c. Can handle both numerical and categorical data.
d. Can only handle numerical data - answerc
Which of the following statements best describes an ensemble method in machine
learning?
a. A technique that combines the results of multiple models to improve overall predictive
accuracy,
b. An algorithm that learns to find patterns and relationships in data without being
explicitly programmed,
c. A method that automatically groups similar data points into clusters based on their
characteristics.
d. A model that predicts the value of a dependent variable based on the values of one
or more independent variables. - answera
Which of the following best describes supervised learning?
a. A machine learning approach where the algorithm learns to optimize a performance
metric by adjusting its internal parameters.
b. A machine learning approach where the algorithm automatically groups similar data
points into clusters.
c. A machines learning approach where the algorithm receives labeled data and learns
to map inputs to outputs based on those labels.
d. A machine learning approach where the algorithm learns to find patterns and
relationships in data without being explicitly programmed. - answerc
Which of the following statements best describes classification in machine learning?
a. A type of supervised learning where the goal is to predict a continuous target based
on input features.
b. A type of reinforcement learning where the goal is to learn an optimal policy for
making decisions in an environment.
c. A type of unsupervised learning where the goal is to group similar data points into
clusters.
, d. A type of supervised learning where the goal is to assign input data points to
predefined categories or classes. - answerd
We want the R-squared value for our regression model to be 100%.
a. True
b. Fales - answerb
One weakness of cross-validation discussed is that information can sometimes
_______ across different periods. A common situation in which this happens is when we
are looking at stock data.
a. Leak
b. Overfit
c. Not leak
d. Underfit - answera
In which of these situations would you want to use a clustering algorithm?
a. You were given the financial data for the Federal Reserve of New York in 2023 and
want to determine where the discrepancy in accumulated depreciation came from
before you submit the financial statements.
b. You have a dataset containing past crimes of current defendants and you want to
determine the likelihood that they will commit another crime.
c. You have a dataset containing 2023 Charlotte, NC housing data and you want to
predict 2024 housing prices.
d. You have a dataset containing set for Cheesecake Factory and you want to look at
customer spending at the restaurant in order to find patterns among customers who
share similar characteristics. - answerd
What is a potential downside of using linear regression models in machine learning?
a. They are not suitable for predicting continuous target variables.
b. They are too complex and difficult to interpret.
c. They are prone to overfitting the data.
d. They can only handle numerical data. - answerc
What type of algorithm would you use to segment customers into groups?
Assume the groups ARE already labeled.
a. Decision trees.
b. Cluster regression
c. Random forest
d. Regression
e. All of the above. - answere
Which of the following is true about data validation and cross-validation in machine
learning?
a. Data validation and cross-validation are used to evaluate a model's performance and
prevent overfitting.
b. Data validation and cross-validation are the same thing.