ISYE 6501 Midterm 2 Certification
Exam Questions and Answers
What is a key assumption of Markov chain? - Answer-It is memoryless, the state
transitions only depend on the most recent state, though most systems do not exhibit
this property
How to deal with missing data? - Answer-1.) throw it away
2.) use categorical variables to indicate missing data
3.) imputation - mean/mode
4.) imputation - regression
5.) imputation - regression + perturbation
Pros and cons of throwing away missing data - Answer-Pros: not potentially introducing
errors, easy to implement
Cons: Don't want to lose too many data points, potential for censored or biased missing
data
Pros and cons of using categorical variables for missing data - Answer-Missing data
could be biased, so need to include interaction variables for all other data points,
basically creating two models, one with the data and one with the missing data
Pros and cons of imputation with mean and mode - Answer-Pros: hedge against being
too wrong, easy to compute
Cons: biased imputation, if certain group has missing data, the mean may not be truly
representative
Pros and cons of imputation through regression - Answer-Pros: reduce or eliminate
problem of bias
Cons: complex to build, fit, validate, and test for missing data; does not capture all the
variability (hence perturbation), and using the data twice (one for missing data and one
for modeling) could lead to more overfitting
Imputation should not be used when more than ___% of data is missing - Answer-5%;
use indicator or categorical variables
What are the three main components of optimization? - Answer-Variables, Constraints,
and Objective function
What are variables in optimization model? - Answer-decisions that the optimization
solver will pick the best value for; must be something we can alter or change
, What are constraints in optimization model? - Answer-restrictions on the variables
values; important b/c software does math, but doesn't understand reality; must contain
at least one variable or else it is just a statement
What is the objective function in optimization model? - Answer-measure the quality of
the solution, the set of variables; we typically want to min or max the function
Feasible vs Optimal solution - Answer-Feasible is a possible set of variable values that
satisfy all constraints
Optimal is the set of variable values with the best objective function
What are three ways to customize statistical and ML models? - Answer-1.) add custom
constraints - ex: a0 = 0 so when all factors are zero, it should be zero
2.) select features - set a new variable with the total equating to a number of features to
include
3.) modify the objective function - linear regression model, instead of error squared do
3/2
How to account for randomness or uncertainty in optimization models? - Answer-1.)
model conservatively - add in additional factors to buffer optimization models such as
adding additional workers to call center variable theta even if we are minimizing the
number of workers to make sure we are not estimating
2.) Scenario Modeling - define multiple scenarios and optimize over all of them using
the probability of each scenario occurring and find the expected value (ex. costs)
Other approaches of optimization besides mathematical programming models - Answer-
1. Dynamic program
2. Stochastic dynamic program
3. Markov decision process
What are the main two steps in optimization? - Answer-1.) Initialization create first
solution, to pick values for all variables
2.) repeat two stage process - find an improving direction t, using a step size theta to
move along it; new solution = old solution + theta*t
Stop when solution doesn't change
Examples of when data could be missing - Answer-Software issues, data not being
collected, sensor failure, data entry failure, data not being available, data is wrong and
not fixable, removed for legal/privacy reasons, data purposely removed maliciously
If there are multiple variables with missing data, but most data points are missing one
variable, how might you go about imputing them? - Answer-One approach is MICE
(multiple imputation by chained equation) - iteratively imputes one variable based on
Exam Questions and Answers
What is a key assumption of Markov chain? - Answer-It is memoryless, the state
transitions only depend on the most recent state, though most systems do not exhibit
this property
How to deal with missing data? - Answer-1.) throw it away
2.) use categorical variables to indicate missing data
3.) imputation - mean/mode
4.) imputation - regression
5.) imputation - regression + perturbation
Pros and cons of throwing away missing data - Answer-Pros: not potentially introducing
errors, easy to implement
Cons: Don't want to lose too many data points, potential for censored or biased missing
data
Pros and cons of using categorical variables for missing data - Answer-Missing data
could be biased, so need to include interaction variables for all other data points,
basically creating two models, one with the data and one with the missing data
Pros and cons of imputation with mean and mode - Answer-Pros: hedge against being
too wrong, easy to compute
Cons: biased imputation, if certain group has missing data, the mean may not be truly
representative
Pros and cons of imputation through regression - Answer-Pros: reduce or eliminate
problem of bias
Cons: complex to build, fit, validate, and test for missing data; does not capture all the
variability (hence perturbation), and using the data twice (one for missing data and one
for modeling) could lead to more overfitting
Imputation should not be used when more than ___% of data is missing - Answer-5%;
use indicator or categorical variables
What are the three main components of optimization? - Answer-Variables, Constraints,
and Objective function
What are variables in optimization model? - Answer-decisions that the optimization
solver will pick the best value for; must be something we can alter or change
, What are constraints in optimization model? - Answer-restrictions on the variables
values; important b/c software does math, but doesn't understand reality; must contain
at least one variable or else it is just a statement
What is the objective function in optimization model? - Answer-measure the quality of
the solution, the set of variables; we typically want to min or max the function
Feasible vs Optimal solution - Answer-Feasible is a possible set of variable values that
satisfy all constraints
Optimal is the set of variable values with the best objective function
What are three ways to customize statistical and ML models? - Answer-1.) add custom
constraints - ex: a0 = 0 so when all factors are zero, it should be zero
2.) select features - set a new variable with the total equating to a number of features to
include
3.) modify the objective function - linear regression model, instead of error squared do
3/2
How to account for randomness or uncertainty in optimization models? - Answer-1.)
model conservatively - add in additional factors to buffer optimization models such as
adding additional workers to call center variable theta even if we are minimizing the
number of workers to make sure we are not estimating
2.) Scenario Modeling - define multiple scenarios and optimize over all of them using
the probability of each scenario occurring and find the expected value (ex. costs)
Other approaches of optimization besides mathematical programming models - Answer-
1. Dynamic program
2. Stochastic dynamic program
3. Markov decision process
What are the main two steps in optimization? - Answer-1.) Initialization create first
solution, to pick values for all variables
2.) repeat two stage process - find an improving direction t, using a step size theta to
move along it; new solution = old solution + theta*t
Stop when solution doesn't change
Examples of when data could be missing - Answer-Software issues, data not being
collected, sensor failure, data entry failure, data not being available, data is wrong and
not fixable, removed for legal/privacy reasons, data purposely removed maliciously
If there are multiple variables with missing data, but most data points are missing one
variable, how might you go about imputing them? - Answer-One approach is MICE
(multiple imputation by chained equation) - iteratively imputes one variable based on