ISYE 6501 Final| study set exam guide verified A+
ISYE 6501 Final| study set exam guide verified A+ISYE 6501 Final| study set exam guide verified A+ Support Vector Machine - correct answer-A supervised learning, classification model. Uses extremes, or identified points in the data from which margin vectors are placed against. The hyperplane between these vectors is the classifier SVM Pros/Cons - correct answer-Pros: It works really well with a clear margin of separation It is effective in high dimensional spaces. It is effective in cases where the number of dimensions is greater than the number of samples. It uses a subset of training points in the decision function (called support vectors), so it is also memory efficient. Cons: Not good for very large data sets Not good for when the data set has more noise i.e. target classes are overlapping Doesn't directly provide probability estimates. K-nearest neighbor (K-NN) - correct answer-An unsupervised classification algorithm. Looks at the X number of closest points to the new one and classifies as whichever is most common. K-nearest neighbor (K-NN) Pros/Cons - correct answer-Pros: No assumptions about data Easy to understand/Interpret Varsatile Cons: Computationally expensive because algorithm stores all training data Sensitive to irrelevant features and scale of data k-fold cross validation - correct answer-Validation Technique where data is divided into X number of data subsets. Each subset is then used as a for testing while the rest are used for training. The algorithm then rotates through each subset and averages the results K Fold cross Validation Pros/Cons - correct answer-Pros: Validates Performance of model Can create balance across predicted features classes Cons: Doesn't work well with time series data The aggregate scores of your model could miss some important extreme values or overpower them so theyre harder to pick up on k-means clustering - correct answer-Unsupervised learning heuristic that sets x starts by assigning x number of cluster centers, then clusters all data points into each of them based on distance. The center point of each cluster is then calculated and all data points are again re clustered. Repeat process until no-data points change clusters. Ideal number of clusters can be identified via elbow diagram. k-means pros and cons - correct answer-Pros: Simple to implement Scales well to large data sets Easily adaptable Cons: Choosing K manually can bias it towards initial values sensitive to outliers Grubbs Outlier Test - correct answer-A formula that uses an outlier's value, the mean of the data, and the standard deviation to determine whether or not the data point is within the confidence interval for a normal distribution or should be thrown out CUSUM - correct answer-Change detection model that keeps a running total of the amount that observations vary above the expected value. The running total exceeds a preset threshold value, it indicates there has been a change. CUSUM Pros/Cons - correct answer-Pros: Best way to detect the small shifts of process mean especially 0.5 to 2 SD from the target mean Easy to identify visually the shifts in process mean Cons: Cumbersome to establish and maintain Tough to interpret the patterns. Choosing C and T values is a pro and con as it can cause bias but creates more flexibility Exponential Smoothing - correct answer-Technique regarding time series data in which older observations are assigned exponentially decreasing weights, so more emphasis is given to recent observations. Can include trends, seasonality, and cyclic patterns to account for expected differences in observations over time Exponential Smoothing Pros and Cons - correct answer-Pros: Easy to learn and apply Can produce accurate forecasts can account for trends/seasonality/cyclic effects Works well when mean/variance/etc are expected to remain relatively constant Cons: Forecasts can sometimes have lag ARIMA (Auto Regressive Integrated Moving Average) - correct answer-A time series analysis method used for forecasting that combines three components: Differences in differences to find stationary change when data metrics aren't stationary. Autoregression, where predicting current value is based on previous time period values Moving averages where we go back and incorporate q time periods' previous errors GARCH (Generalized Autoregressive Conditional Heteroskedasticity) - correct answer-Time series analytic method that estimates/forecasts variance. Helps determine how much a forecast may be higher or lower than the true value. Useful for estimating risks on investment portfolios. Linear Regression - correct answer-A Regression technique that describes relationships between independent and dependent variables as linear functions AIC (Akaike information criterion) - correct answer-Model selection technique that balances model fit and complexity. Penalizes models with too much complexity in an attempt to avoid overfitting. BIC (Bayesian information criterion) - correct answer-Model election technique that balances model fit and complexity. Generally penalizes complexity more than AIC. Box-Cox Transformation - correct answer-Logarithmic transformation technique used to eliminate heteroskedasticity (uneq
Written for
- Institution
- ISYE 6501
- Course
- ISYE 6501
Document information
- Uploaded on
- December 2, 2024
- Number of pages
- 7
- Written in
- 2024/2025
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
isye 6501 final study set exam guide verified a
Also available in package deal