IBM Data Science New Exam With Complete
Solutions 100% Verified
Descriptive tables share which of the following characteristics? - ANSWER a.
Measures of Central Tendency
b. Measures of Dispersion
c. Measures of Distribution
d. All of the above
What does 'pure subset' mean with regards to decision trees? ANSWER a. All
attributes of a leaf had yes for answer.
b. All attributes of a leaf had no for answer.
d. The leaf cannot be divided any more.
If you had to identify a single, overarching difference between these methodologies for
Question 19, which of the following would best describe that difference in approach? -
ANSWER a. Unlike KDD and SEMMA, CRIPS-DM considers business understanding.
Which of the following is correct? ANSWER a. The data scientist transform data into
insight to solve the business problem.
b. Data Journalist capture the domain knowledge for a successful business alignment.
c. Data engineer architect how data should be organized and operability.
d. All the above
Data visualization comes in two broad categories. Which of the below depict this
distinction: - ANSWER Exploratory versus explanatory visualization
,If you had to describe a Naïve Bayes theorem, which of the following would apply?
ANSWER a. Prior probabilities are based on previous experience.
b. The Classifies features assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature.
c. It is particularly suited when the dimensionality of the inputs is high
Which of the following best describes a Decision Tree Classifier? - ANSWER b. A
mapping of observations about an item --represented in the branches --to conclusions
about the item's target value --represented in the leaves.
Consider the following: You are interested to find out why some employees leave while
others stay. You have a CSV file containing columns or features on metrics such as
distance from home, age, and other categorical information like male/female, level of
education, marital status, and so on. If you were to choose a model to study the problem
of employee attrition, which of the following would be the best fit? - ANSWER a. Binary
classification
A particular model of machine learning has detected 80 true positive signals and,
moreover, 20 false positive signalsinclude them as relevant data when they actually are
not. What will the precision of the system be? ANSWER a. 80%
Consider the following diagram: [1 red in net] If the red fish represent relevant data or
signal, and the blue fish represent irrelevant data or noise, what is the precision of this
system? - ANSWER b. 100%
Sometimes we do not have access to the entire data set-population-and we have to infer
our conclusions using sample data. Which of the following approaches addresses
working with sample data to conclude about the population? - ANSWER a. Inferential
statistic
A spam collection engine has quarantined messages that were not spam, were not
unsolicited and that they were important for the user. How would you characterize those
important yet automatically removed messages? - ANSWER b. False positive
, How is isotonic regression different from a linear regression? - ANSWER a. By fitting a
free-form line to the observations; and the fitted free-form line must be non-decreasing
everywhere.
The biggest risk of overfitting data is that the model will work well on training data but
performs badly on new data. What shall be done so that the problem is reduced? Select
all that apply. - ANSWER a. Use hold out data to evaluate the performance of the model
on new data.
b. Do not use hold out data to select model.
Consider the following diagram: [1 blue and 3 red in net] Given that red fish is relevant
data - signal, and blue fish is irrelevant data - noise, what is the precision of this system?
- ANSWER. 0.75
If you choose a multiclass classification tree in Watson Studio, which of the following
estimators, or algorithms, are available to you? - ANSWER a. Decision tree classifier
b. Random forest classifier
c. Naive Bayes
d. All of the above
When training models, you would typically place your data into three buckets: train, test
and hold out. What is the purpose of having hold out data? - ANSWER a. A holdout
sample is a part of the data you leave out of the model building so it can be used to
evaluate the model afterward
b. A holdout sample helps you compare models and ensure that you will be able to
generalize results to data which the model has not seen.
c. Working with a holdout sample helps you to choose the best performing model
d. All of the above is true.
Linear Regression attempts to fit a line while __________ the distance to each point. Fill in
the blank. - ANSWER b. Minimizing
Solutions 100% Verified
Descriptive tables share which of the following characteristics? - ANSWER a.
Measures of Central Tendency
b. Measures of Dispersion
c. Measures of Distribution
d. All of the above
What does 'pure subset' mean with regards to decision trees? ANSWER a. All
attributes of a leaf had yes for answer.
b. All attributes of a leaf had no for answer.
d. The leaf cannot be divided any more.
If you had to identify a single, overarching difference between these methodologies for
Question 19, which of the following would best describe that difference in approach? -
ANSWER a. Unlike KDD and SEMMA, CRIPS-DM considers business understanding.
Which of the following is correct? ANSWER a. The data scientist transform data into
insight to solve the business problem.
b. Data Journalist capture the domain knowledge for a successful business alignment.
c. Data engineer architect how data should be organized and operability.
d. All the above
Data visualization comes in two broad categories. Which of the below depict this
distinction: - ANSWER Exploratory versus explanatory visualization
,If you had to describe a Naïve Bayes theorem, which of the following would apply?
ANSWER a. Prior probabilities are based on previous experience.
b. The Classifies features assumes that the presence of a particular feature in a class is
unrelated to the presence of any other feature.
c. It is particularly suited when the dimensionality of the inputs is high
Which of the following best describes a Decision Tree Classifier? - ANSWER b. A
mapping of observations about an item --represented in the branches --to conclusions
about the item's target value --represented in the leaves.
Consider the following: You are interested to find out why some employees leave while
others stay. You have a CSV file containing columns or features on metrics such as
distance from home, age, and other categorical information like male/female, level of
education, marital status, and so on. If you were to choose a model to study the problem
of employee attrition, which of the following would be the best fit? - ANSWER a. Binary
classification
A particular model of machine learning has detected 80 true positive signals and,
moreover, 20 false positive signalsinclude them as relevant data when they actually are
not. What will the precision of the system be? ANSWER a. 80%
Consider the following diagram: [1 red in net] If the red fish represent relevant data or
signal, and the blue fish represent irrelevant data or noise, what is the precision of this
system? - ANSWER b. 100%
Sometimes we do not have access to the entire data set-population-and we have to infer
our conclusions using sample data. Which of the following approaches addresses
working with sample data to conclude about the population? - ANSWER a. Inferential
statistic
A spam collection engine has quarantined messages that were not spam, were not
unsolicited and that they were important for the user. How would you characterize those
important yet automatically removed messages? - ANSWER b. False positive
, How is isotonic regression different from a linear regression? - ANSWER a. By fitting a
free-form line to the observations; and the fitted free-form line must be non-decreasing
everywhere.
The biggest risk of overfitting data is that the model will work well on training data but
performs badly on new data. What shall be done so that the problem is reduced? Select
all that apply. - ANSWER a. Use hold out data to evaluate the performance of the model
on new data.
b. Do not use hold out data to select model.
Consider the following diagram: [1 blue and 3 red in net] Given that red fish is relevant
data - signal, and blue fish is irrelevant data - noise, what is the precision of this system?
- ANSWER. 0.75
If you choose a multiclass classification tree in Watson Studio, which of the following
estimators, or algorithms, are available to you? - ANSWER a. Decision tree classifier
b. Random forest classifier
c. Naive Bayes
d. All of the above
When training models, you would typically place your data into three buckets: train, test
and hold out. What is the purpose of having hold out data? - ANSWER a. A holdout
sample is a part of the data you leave out of the model building so it can be used to
evaluate the model afterward
b. A holdout sample helps you compare models and ensure that you will be able to
generalize results to data which the model has not seen.
c. Working with a holdout sample helps you to choose the best performing model
d. All of the above is true.
Linear Regression attempts to fit a line while __________ the distance to each point. Fill in
the blank. - ANSWER b. Minimizing