Guide & Practice Questions
What is a decision tree, and how is it used in data classification tasks? –
Answer ✅️ ✅A️decision tree is a flowchart-like structure used for decision-
making and classification tasks. It is used in data classification tasks by
dividing the dataset into smaller subsets based on feature values.
Describe the components of a decision tree, including internal nodes,
branches, and leaf nodes. – Answer ✅️✅Internal
️ Nodes: Decision points for
attributes.
Branches: Outcomes of decisions.
Leaf Nodes: Final predictions or classifications.
Explain the steps involved in building a decision tree. – Answer ✅️ ✅Steps
️
include: selecting the best attribute (using metrics like entropy/information
gain), splitting the dataset, and recursively repeating until all nodes are pure
or stopping criteria are met.
Define entropy and information gain in the context of decision trees and
explain how they influence the construction of the tree. – Answer
✅️✅Entropy:
️ Measure of impurity or disorder.
Information Gain: Reduction in entropy after a dataset split.
These concepts influence the tree construction process by helping to identify
the best feature to split on.
,What is the Gini index, and how is it used in building decision trees? – Answer
✅️ ✅A️metric to measure node impurity (probability of incorrectly classifying
a randomly chosen data point). It identifies best split.
Describe the ID3 algorithm and its role in decision tree construction. –
Answer ✅️ ✅Builds
️ decision trees by recursively selecting attributes that
maximize information gain.
How are attributes selected for splitting at each node in a decision tree? –
Answer ✅️ ✅Attributes
️ are selected based on metrics like information gain or
Gini index.
Discuss the techniques used to prevent overfitting in decision trees. – Answer
✅️✅Use
️ pruning, limit tree depth, or ensure sufficient data at nodes.
Explain the difference between pre-pruning and post-pruning in decision tree
algorithms. – Answer ✅️ ✅Pre-Pruning:
️ Halts tree growth early based on
criteria.
Post-Pruning: Removes branches after the tree is fully grown.
List and explain the advantages and limitations of using decision trees. –
Answer ✅️ ✅Advantages:
️ Simple, interpretable, handles
categorical/continuous data.
Limitations: Prone to overfitting, unstable.
Compare and contrast decision trees with Random Forests. – Answer
✅️✅Random
️ forests use multiple decision trees to improve stability and
accuracy through bagging.
, How do decision trees handle missing values during the training process? –
Answer ✅️ ✅Methods
️ include ignoring missing values, surrogate splits,
assigning missing values to most frequent class, imputing before training.
Provide examples of real-world applications where decision trees are
effectively used. – Answer ✅️
✅Fraud
️ detection, medical diagnosis, and
customer segmentation.
How does a decision tree algorithm handle continuous and categorical
variables differently? – Answer ✅️✅Continuous
️ variables are split into
ranges; categorical variables are split by category.
What methods are used to evaluate the performance of a decision tree? –
Answer ✅️ ✅Use
️ metrics like accuracy, precision, recall, F1 score, or
confusion matrix.
What is the purpose of evaluation metrics in machine learning? – Answer
✅️✅Assessing
️ model performance, comparing models, handling trade-offs,
ensuring generalization.
Define accuracy in the context of binary classification. – Answer ✅️ ✅Metric
️
that measures proportion of correctly classified instances out of total
instances.
Why might accuracy not be a good metric in cases of class imbalance? –
Answer ✅️✅In
️imbalanced datasets, accuracy can be misleading due to
dominance of the majority class and ignoring minority class performance.
Define precision and recall. Why are these metrics important in binary
classification tasks? – Answer ✅️✅Precision
️ measures proportion of correctly
predicted positive instances out of all instances predicted as positive.