Sample Exam
Course: Data Science for Business
Lecturers: dr. C. Amrit
University of Amsterdam
Name:___________________________________________
Student nr:_________________________________________
Please return the question paper after the exam!
Closed Book Exam
This is a closed book exam: No course materials (slides handouts, books, and papers) can be
used during the exam. One mark per question, except for the last question that carries 2 bonus
marks. The grading
1. Which is not a reason why data mining technologies are attracting significant attention nowadays?
A. There is too much data for manual analysis
B. Data are difficult to transfer from databases
C. Data can be a resource for competitive advantage
D. Machine learning algorithms are easily available
E. None of the Above
2. Regression is distinguished from classification by:
A. class probability estimation
B. numerical attributes
C. numerical target variable
D. hypothesis testing
E None of the above
3. Entropy
A. is a measure of information gain
B. is used to calculate information gain
C. is a measure of correlation between numeric variables
D. denotes the amount of chaos in the data
E. describes the amount of outliers in the data
4. Which of the following is not true about logistic regression?
A. Logistic regression can be used to predict the probability of membership in a certain class.
B. Logistic regression takes a categorical target variable in training data.
, C. A logistic regression represents the odds of class membership as a linear function of the
attributes.
D. Logistic regression requires numeric attributes and categorical attributes should be converted
to numeric attributes.
E. A logistic regression represents the odds of class membership as a nonlinear function of the
attributes.
5.An example of a supervised learning algorithm is
A. Statistical analysis
B. Neural network
C. Clustering techniques
D. Naïve Bayesian algorithm
E. None of the above
6. A fitting curve plots:
A. True positive rate vs. false positive rate
B. True positive rate vs. false negative rate
C. Generalization performance vs. size of training set
D. Generalization performance vs. model complexity
E. None of the above
7. When the causal relation between the input and the output variables is too complex, one would use:
A. Statistical modeling
B. Supervised learning
C. Unsupervised learning
D. All the above
E. None of the above
8. The variable marital status can be categorized using the codes (1) single, (2) married, and (3)
divorced. This is an example of a:
A. Ordinal variable
B. Nominal variable
C. Interval variable
D. Ratio variable
E. None of the above
9. Consider the following decision tree:
Course: Data Science for Business
Lecturers: dr. C. Amrit
University of Amsterdam
Name:___________________________________________
Student nr:_________________________________________
Please return the question paper after the exam!
Closed Book Exam
This is a closed book exam: No course materials (slides handouts, books, and papers) can be
used during the exam. One mark per question, except for the last question that carries 2 bonus
marks. The grading
1. Which is not a reason why data mining technologies are attracting significant attention nowadays?
A. There is too much data for manual analysis
B. Data are difficult to transfer from databases
C. Data can be a resource for competitive advantage
D. Machine learning algorithms are easily available
E. None of the Above
2. Regression is distinguished from classification by:
A. class probability estimation
B. numerical attributes
C. numerical target variable
D. hypothesis testing
E None of the above
3. Entropy
A. is a measure of information gain
B. is used to calculate information gain
C. is a measure of correlation between numeric variables
D. denotes the amount of chaos in the data
E. describes the amount of outliers in the data
4. Which of the following is not true about logistic regression?
A. Logistic regression can be used to predict the probability of membership in a certain class.
B. Logistic regression takes a categorical target variable in training data.
, C. A logistic regression represents the odds of class membership as a linear function of the
attributes.
D. Logistic regression requires numeric attributes and categorical attributes should be converted
to numeric attributes.
E. A logistic regression represents the odds of class membership as a nonlinear function of the
attributes.
5.An example of a supervised learning algorithm is
A. Statistical analysis
B. Neural network
C. Clustering techniques
D. Naïve Bayesian algorithm
E. None of the above
6. A fitting curve plots:
A. True positive rate vs. false positive rate
B. True positive rate vs. false negative rate
C. Generalization performance vs. size of training set
D. Generalization performance vs. model complexity
E. None of the above
7. When the causal relation between the input and the output variables is too complex, one would use:
A. Statistical modeling
B. Supervised learning
C. Unsupervised learning
D. All the above
E. None of the above
8. The variable marital status can be categorized using the codes (1) single, (2) married, and (3)
divorced. This is an example of a:
A. Ordinal variable
B. Nominal variable
C. Interval variable
D. Ratio variable
E. None of the above
9. Consider the following decision tree: