Applied Quantitative Modelling
DSC2608
ASSIGNMENT 5 SEMESTER 1 2024
DO NOT SUBMIT THIS COPY, WRITE IN YOUR OWN WORDS.
, Question 1
1.1 False. The k-Nearest Neighbour (k-NN) algorithm typically involves more
computation during test time compared to train time. During training, k-NN simply
stores the dataset, while during testing, it computes the distance of a test sample to
all training points.
1.2 False. Training a linear regression estimator with only half the data does not
necessarily mean that the predictive performance of the model will be better. It
depends on the quality and representativeness of the data. If the half data you train
on is not representative of the whole, it could lead to a worse model.
1.3 True. Splitting in a decision tree is indeed the process of dividing a node into two
or more sub-nodes, creating a branch for each outcome.
1.4 True. Both C4.5 and CART (Classification and Regression Trees) are well-known
decision tree algorithms extensively used in machine learning to build decision trees.
1.5 False. The C4.5 technique builds a decision tree from the training data in a top-
down, recursive, divide-and-conquer manner, not bottom-up.
Question 2
2.1 Load the ISLR package, import the Carseats dataset and assign it to the
variable data carseats.
DSC2608
ASSIGNMENT 5 SEMESTER 1 2024
DO NOT SUBMIT THIS COPY, WRITE IN YOUR OWN WORDS.
, Question 1
1.1 False. The k-Nearest Neighbour (k-NN) algorithm typically involves more
computation during test time compared to train time. During training, k-NN simply
stores the dataset, while during testing, it computes the distance of a test sample to
all training points.
1.2 False. Training a linear regression estimator with only half the data does not
necessarily mean that the predictive performance of the model will be better. It
depends on the quality and representativeness of the data. If the half data you train
on is not representative of the whole, it could lead to a worse model.
1.3 True. Splitting in a decision tree is indeed the process of dividing a node into two
or more sub-nodes, creating a branch for each outcome.
1.4 True. Both C4.5 and CART (Classification and Regression Trees) are well-known
decision tree algorithms extensively used in machine learning to build decision trees.
1.5 False. The C4.5 technique builds a decision tree from the training data in a top-
down, recursive, divide-and-conquer manner, not bottom-up.
Question 2
2.1 Load the ISLR package, import the Carseats dataset and assign it to the
variable data carseats.