ISYE 6501 Exam Preparation A Verified Study Guide To Approach Exam.
1-norm - answers Similar to rectilinear distance; measures the straight-line length of a
vector from the origin. If z=(z1,z2,...,zm) is a vector in an m-dimensional space, then it's
1-norm is square root(|𝑧1|+|𝑧2|+⋯+|𝑧𝑚| = |𝑧1|+|𝑧2|+⋯+|𝑧| = Σm over i=1 |𝑧𝑖|
A/B Testing - answers testing two alternatives to see which one performs better
2-norm - answers Similar to Euclidian distance; measures the straight-line length of a
vector from the origin. If z=(z1,z2,...,zm) is a vector in an 𝑚-dimensional space, then its
2-norm is the same as 1-norm but everything is squared= square root(Σm over i=1 (|
𝑧𝑖|)^2)
Accuracy - answers Fraction of data points correctly classified by a model; equal to
TP+TN / TP+FP+TN+FN
Action - answers In ARENA, something that is done to an entity.
Additive Seasonality - answers Seasonal effect that is added to a baseline value (for
example, "the temperature in June is 10 degrees above the annual baseline").
Adjusted R-squared - answers Variant of R2 that encourages simpler models by
penalizing the use of too many variables.
AIC - answers Akaike information criterion- Model selection technique that trades off
between model fit and model complexity. When comparing models, the model with
lower AIC is preferred. Generally penalizes complexity less than BIC.
Algorithm - answers Step-by-step procedure designed to carry out a task.
Analysis of Variance/ANOVA - answers Statistical method for dividing the variation in
observations among different sources.
Approximate dynamic program - answers Dynamic programming model where the value
functions are approximated.
Arc - answers Connection between two nodes/vertices in a network. In a network
model, there is a variable for each arc, equal to the amount of flow on the arc, and
(optionally) a capacity constraint on the arc's flow. Also called an edge.
Area under the curve (AUC) - answers Area under the ROC curve; an estimate of the
classification model's accuracy. Also called concordance index.
ARIMA - answers Autoregressive integrated moving average.
,Arrival Rate - answers Expected number of arrivals of people, things, etc. per unit time
-- for example, the expected number of truck deliveries per hour to a warehouse.
Assignment Problem - answers Network optimization model with two sets of nodes, that
finds the best way to assign each node in one set to each node in the other set.
Attribute - answers A characteristic or measurement - for example, a person's height or
the color of a car. Generally interchangeable with "feature", and often with "covariate" or
"predictor". In the standard tabular format, a column of data.
Autoregression - answers Regression technique using past values of time series data
as predictors of future values.
Autoregressive integrated moving average (ARIMA) - answers Time series model that
uses differences between observations when data is nonstationary. Also called Box-
Jenkins.
Backward elimination - answers Variable selection process that starts with all variables
and then iteratively removes the least-immediately-relevant variables from the model.
Balanced Design - answers Set of combinations of factor values across multiple factors,
that has the same number of runs for all combinations of levels of one or more factors.
Balking - answers An entity arrives to the queue, sees the size of the line (or some other
attribute), and decides to leave the system.
Bayes' theorem/Bayes' rule - answers Fundamental rule of conditional probability:
𝑃(𝐴|𝐵)=𝑃(𝐵|𝐴)*𝑃(𝐴) / 𝑃(𝐵)
Bayesian Information criterion (BIC) - answers Model selection technique that trades off
model fit and model complexity. When comparing models, the model with lower BIC is
preferred. Generally penalizes complexity more than AIC.
Bayesian Regression - answers Regression model that incorporates estimates of how
coefficients and error are distributed.
Bellman's Equation - answers Equation used in dynamic programming that ensures
optimality of a solution.
Bernoulli Distribution - answers Discrete probability distribution where the outcome is
binary, either 0 or 1. Often, 1 represents success and 0 represents failure. The
probability of the outcome being 1 is 𝑝 and the probability of outcome being 0 is 𝑞 =
1−𝑝, where 𝑝 is between 0 and 1.
Bias - answers Systematic difference between a true parameter of a population and its
estimate.
,Binary Data - answers Data that can take only two different values (true/false, 0/1,
black/white, on/off, etc.)
Binary integer program - answers Integer program where all variables are binary
variables.
Binary Variable - answers Variable that can take just two values: 0 and 1.
Binomial Distribution - answers Discrete probability distribution for the exact number of
successes, k, out of a total of n iid Bernoulli trials, each with probability p: Pr(𝑘)= (n over
k) p^k(1-p)^n-k
Blocking - answers Factor introduced to an experimental design that interacts with the
effect of the factors to be studied. The effect of the factors is studied within the same
level (block) of the blocking factor.
box and whisker plot - answers Graphical representation data showing the middle range
of data (the "box"), reasonable ranges of variability ("whiskers"), and points (possible
outliers) outside those ranges.
Box-Cox Transformation - answers Transformation of a non-normally-distributed
response to a normal distribution.
Branching - answers Splitting a set of data into two or more subsets, to each be
analyzed separately.
CART - answers Classification and regression trees.
Categorical Data - answers Data that classifies observations without quantitative
meaning (for example, colors of cars) or where quantitative amounts are categorized
(for example, "0-10, 11-20, ...").
Causation - answers Relationship in which one thing makes another happen (i.e., one
thing causes another).
Chance Constraint - answers A probability-based constraint. For example, a standard
linear constraint might be 𝐴x≤𝑏. A similar chance constraint might be Pr (𝐴x≤𝑏)≥0.95
Change Detection - answers Identifying when a significant change has taken place in a
process.
Classification - answers The separation of data into two or more categories, or (a point's
classification) the category a data point is put into.
, Classification tree - answers Tree-based method for classification. After branching to
split the data, each subset is analyzed with its own classification model.
Classifier - answers A boundary that separates the data into two or more categories.
Also (more generally) an algorithm that performs classification.
Clique - answers A set of nodes where each pair is connected by an arc.
Cluster - answers A group of points identified as near/similar to each other.
Cluster Center - answers In some clustering algorithms (like 𝑘𝑘-means clustering), the
central point (often the centroid) of a cluster of data points.
Clustering - answers Separation of data points into groups ("clusters") based on
nearness/similarity to each other. A common form of unsupervised learning.
Collective outlier - answers A set of data points that is (uncommonly) different from
others - for example, a missing heartbeat in an electrocardiogram; we don't know
exactly which millisecond it should've happened in, but collectively there's a set of
milliseconds that it's missing from.
Concave Function - answers A function f() where for every two points 𝑥 and 𝑦, 𝑓(𝑐x+
(1−𝑐)𝑦) ≥ 𝑐𝑓(𝑥) + (1−𝑐)𝑓(𝑦) for all 𝑐 between 0 and 1. In two dimensions, this means if the
points (𝑥,𝑓(𝑥)) and (𝑦,𝑓(𝑦)) are connected with a straight line, the line is always below
[or equal to] the function's curve between those two points. If 𝑓() is concave, then −𝑓() is
convex.
concordance index - answers Area under the ROC curve; an estimate of the
classification model's accuracy. Also called AUC.
Confusion matrix - answers Visualization of classification model performance.
Constant - answers A number that remains the same.
constraint - answers Part of an optimization model that describes a restriction on the
solution (the values of the variables).
Contextual outlier - answers A data point that is (uncommonly) far from other data
points related to it - for example, in Atlanta, a 90-degree (Fahrenheit) day in winter is an
outlier, but a 90-degree day in summer is not.
continuous-time simulation - answers A simulation that models a system continuously,
at every instant of time; continuous-time simulation models are often based on
differential equations.
1-norm - answers Similar to rectilinear distance; measures the straight-line length of a
vector from the origin. If z=(z1,z2,...,zm) is a vector in an m-dimensional space, then it's
1-norm is square root(|𝑧1|+|𝑧2|+⋯+|𝑧𝑚| = |𝑧1|+|𝑧2|+⋯+|𝑧| = Σm over i=1 |𝑧𝑖|
A/B Testing - answers testing two alternatives to see which one performs better
2-norm - answers Similar to Euclidian distance; measures the straight-line length of a
vector from the origin. If z=(z1,z2,...,zm) is a vector in an 𝑚-dimensional space, then its
2-norm is the same as 1-norm but everything is squared= square root(Σm over i=1 (|
𝑧𝑖|)^2)
Accuracy - answers Fraction of data points correctly classified by a model; equal to
TP+TN / TP+FP+TN+FN
Action - answers In ARENA, something that is done to an entity.
Additive Seasonality - answers Seasonal effect that is added to a baseline value (for
example, "the temperature in June is 10 degrees above the annual baseline").
Adjusted R-squared - answers Variant of R2 that encourages simpler models by
penalizing the use of too many variables.
AIC - answers Akaike information criterion- Model selection technique that trades off
between model fit and model complexity. When comparing models, the model with
lower AIC is preferred. Generally penalizes complexity less than BIC.
Algorithm - answers Step-by-step procedure designed to carry out a task.
Analysis of Variance/ANOVA - answers Statistical method for dividing the variation in
observations among different sources.
Approximate dynamic program - answers Dynamic programming model where the value
functions are approximated.
Arc - answers Connection between two nodes/vertices in a network. In a network
model, there is a variable for each arc, equal to the amount of flow on the arc, and
(optionally) a capacity constraint on the arc's flow. Also called an edge.
Area under the curve (AUC) - answers Area under the ROC curve; an estimate of the
classification model's accuracy. Also called concordance index.
ARIMA - answers Autoregressive integrated moving average.
,Arrival Rate - answers Expected number of arrivals of people, things, etc. per unit time
-- for example, the expected number of truck deliveries per hour to a warehouse.
Assignment Problem - answers Network optimization model with two sets of nodes, that
finds the best way to assign each node in one set to each node in the other set.
Attribute - answers A characteristic or measurement - for example, a person's height or
the color of a car. Generally interchangeable with "feature", and often with "covariate" or
"predictor". In the standard tabular format, a column of data.
Autoregression - answers Regression technique using past values of time series data
as predictors of future values.
Autoregressive integrated moving average (ARIMA) - answers Time series model that
uses differences between observations when data is nonstationary. Also called Box-
Jenkins.
Backward elimination - answers Variable selection process that starts with all variables
and then iteratively removes the least-immediately-relevant variables from the model.
Balanced Design - answers Set of combinations of factor values across multiple factors,
that has the same number of runs for all combinations of levels of one or more factors.
Balking - answers An entity arrives to the queue, sees the size of the line (or some other
attribute), and decides to leave the system.
Bayes' theorem/Bayes' rule - answers Fundamental rule of conditional probability:
𝑃(𝐴|𝐵)=𝑃(𝐵|𝐴)*𝑃(𝐴) / 𝑃(𝐵)
Bayesian Information criterion (BIC) - answers Model selection technique that trades off
model fit and model complexity. When comparing models, the model with lower BIC is
preferred. Generally penalizes complexity more than AIC.
Bayesian Regression - answers Regression model that incorporates estimates of how
coefficients and error are distributed.
Bellman's Equation - answers Equation used in dynamic programming that ensures
optimality of a solution.
Bernoulli Distribution - answers Discrete probability distribution where the outcome is
binary, either 0 or 1. Often, 1 represents success and 0 represents failure. The
probability of the outcome being 1 is 𝑝 and the probability of outcome being 0 is 𝑞 =
1−𝑝, where 𝑝 is between 0 and 1.
Bias - answers Systematic difference between a true parameter of a population and its
estimate.
,Binary Data - answers Data that can take only two different values (true/false, 0/1,
black/white, on/off, etc.)
Binary integer program - answers Integer program where all variables are binary
variables.
Binary Variable - answers Variable that can take just two values: 0 and 1.
Binomial Distribution - answers Discrete probability distribution for the exact number of
successes, k, out of a total of n iid Bernoulli trials, each with probability p: Pr(𝑘)= (n over
k) p^k(1-p)^n-k
Blocking - answers Factor introduced to an experimental design that interacts with the
effect of the factors to be studied. The effect of the factors is studied within the same
level (block) of the blocking factor.
box and whisker plot - answers Graphical representation data showing the middle range
of data (the "box"), reasonable ranges of variability ("whiskers"), and points (possible
outliers) outside those ranges.
Box-Cox Transformation - answers Transformation of a non-normally-distributed
response to a normal distribution.
Branching - answers Splitting a set of data into two or more subsets, to each be
analyzed separately.
CART - answers Classification and regression trees.
Categorical Data - answers Data that classifies observations without quantitative
meaning (for example, colors of cars) or where quantitative amounts are categorized
(for example, "0-10, 11-20, ...").
Causation - answers Relationship in which one thing makes another happen (i.e., one
thing causes another).
Chance Constraint - answers A probability-based constraint. For example, a standard
linear constraint might be 𝐴x≤𝑏. A similar chance constraint might be Pr (𝐴x≤𝑏)≥0.95
Change Detection - answers Identifying when a significant change has taken place in a
process.
Classification - answers The separation of data into two or more categories, or (a point's
classification) the category a data point is put into.
, Classification tree - answers Tree-based method for classification. After branching to
split the data, each subset is analyzed with its own classification model.
Classifier - answers A boundary that separates the data into two or more categories.
Also (more generally) an algorithm that performs classification.
Clique - answers A set of nodes where each pair is connected by an arc.
Cluster - answers A group of points identified as near/similar to each other.
Cluster Center - answers In some clustering algorithms (like 𝑘𝑘-means clustering), the
central point (often the centroid) of a cluster of data points.
Clustering - answers Separation of data points into groups ("clusters") based on
nearness/similarity to each other. A common form of unsupervised learning.
Collective outlier - answers A set of data points that is (uncommonly) different from
others - for example, a missing heartbeat in an electrocardiogram; we don't know
exactly which millisecond it should've happened in, but collectively there's a set of
milliseconds that it's missing from.
Concave Function - answers A function f() where for every two points 𝑥 and 𝑦, 𝑓(𝑐x+
(1−𝑐)𝑦) ≥ 𝑐𝑓(𝑥) + (1−𝑐)𝑓(𝑦) for all 𝑐 between 0 and 1. In two dimensions, this means if the
points (𝑥,𝑓(𝑥)) and (𝑦,𝑓(𝑦)) are connected with a straight line, the line is always below
[or equal to] the function's curve between those two points. If 𝑓() is concave, then −𝑓() is
convex.
concordance index - answers Area under the ROC curve; an estimate of the
classification model's accuracy. Also called AUC.
Confusion matrix - answers Visualization of classification model performance.
Constant - answers A number that remains the same.
constraint - answers Part of an optimization model that describes a restriction on the
solution (the values of the variables).
Contextual outlier - answers A data point that is (uncommonly) far from other data
points related to it - for example, in Atlanta, a 90-degree (Fahrenheit) day in winter is an
outlier, but a 90-degree day in summer is not.
continuous-time simulation - answers A simulation that models a system continuously,
at every instant of time; continuous-time simulation models are often based on
differential equations.