1. Take a look at the data
Also use summarize misstable, when a variable has to many missing data points you
should not use it in a regression run.
2. Run a simple regression with all predictor variables. (categorical i.var).
3. Assumptions:
No multi collinearity: pwcorr and vif
All relevant predictor variables included: ovtest, if p>0.05 then it is good, otherwise
you should look at transformations.
Homoscedasticity and linearity. estat imtest, white, if p>0.05 then heteroscedasticity.
use bootstrap or robust standard error:
o bootstrap, reps(500) : reg…
o reg … , robust
independent errors: Do you think that the data is clustered? You can find this by using
the command:
xtset var (the variable which you think has clustering)
xtreg target-variable
Look at the Rho, if it is reasonably high, there is clustering.
Noise should be distributed normally:
predict e, resid
sktest e
swilk e (p has to be >0.05)
not too many non-significant variables.
Diag2: it looks at the outlierness of cases. You need to look at the table at the bottom and if there are
cases with high outliersness leave these out the regression and look of this improves the model. Yes?
Look at why they could be outliers (check the data), if you have a reasonably answer leave them out,
otherwise leave them in.