Evatee 7/13/24 PA ACTEX EXAM PA ACTEX CHEAT SHEET (SOA EXAM PA
2024) QUESTIONS WITH ALREADY GRADED A+ SOLUTIONS!!
Reasons for converting a numeric variable to a factor Answer - 1. Values are numeric labels with no order
2. Small number of distinct values
3. Complex relationship with target -> more flexibility in model
Reasons for keeping a numeric variable numeric Answer - 1. sense of numeric order
2. large number of distinct values
3. simple monotonic relationship with target (effect can be captured by a GLM with a single coef.)
4. future obs will have a new variable value (ex: year)
Common strategies for dealing with missing variables Answer - 1. removing obs (if MV's are only a small part of data)
2. removing variables (if majority of values are missing)
3. imputing MV's using mean/mode (if obs with MV's have predictive power)
4. converting MV's to unknown (factors only)
Pro & con of log transformation Answer - Pro: to remedy right skewness & symmetrize the distribution -> improve fit of GLM's Con: cannot be *directly* applied when the variable has non-positive obs
Reasoning behind combining levels Answer - 1. create more representative groups
2. low counts reduce robustness
3. high dimensions dilute predictive power
4. ensure each level has sufficient # of obs
5. preserve the differences in the mean of target variables among different factor levels for prediction
Best graph to detect numeric - categorical predictor interaction Answer - Numeric target: scatterplot colored by categorical predictor
Categorical target: boxplot for numeric predictor split by target and faceted by categorical predictor
Best graph to detect categorical - categorical predictor interaction Answer - Numeric target: boxplot for target split and faceted by predictors
Categorical target: bar chart for one predictor filled by target and faceted by the other predictor
Best graph to detect numeric - numeric predictor interaction Answer - Numeric
& categorical target: bin one of the predictors (cut it into several ranges) (boxplot or histogram), or try a decision tree
Model metrics for numeric target?