Neural networks takes an input vector of p variables : X = (X1,X2,…Xp) and builds a nonlinear
function f(X) to predict the response Y.
Neural networks derive new features by computing different combinations of X, and then
squashes each through an activation function to transform it into the final model as a linear
model in the derived variables.
Neural Network uses feed-forward neural network for modeling a response with p predictors.
Specializations of deep learnings
1. Convolutional Neural Networks: for image classification
2. Recurrent Neural Networks: for time series and other sequences
Advantage
It can fit to almost any data! Very low bias
Typically, neural network will be an attractive choice when the sample size of the training set is
extremely large and when interpretability of the model is not a high priority
Disadvantage
Linear models are much easier to interpret than the neural network
When faced with several methods that give roughly equivalent performance, pick the simplest
10.1 : Single Layer Neural Networks
Structure
- Input layer: consists of the units of features as input units
- Hidden layer: consists of K Hidden units/Activations :
o each of the inputs from the input layer feeds into it. We pick the number of K
hidden units)
o It works as a different transformation of the original features
- Weights : represented as w
, - Bias: represented as B. added to the weighted input unit as an input to the activation
layer
Activation functions
The nonlinearity in the activation function is essential to allow the model to capture complex
nonlinearities and interaction effects of the data.
Sigmoid function : same function used in logistic regression to convert a linear function into
probabilities between 0 and 1.
ReLU (Rectified linear unit) function: function with a threshold 0 that only counts f(x) that is
higher than threshold
- More efficient than sigmoid because of the efficiency in computation and storage: more
preferred function in the modern neural network
the constant term wk0 in the linear function will shift the threshold
Process: Prediction
1. Input observations x + Bias will be passed from input layer to hidden layer with hidden
units/Activation function
a. the observations will be multiplied by the weights for each observation and
passed with Bias to each activation unit
2. The multiplied results of input units and weights will get nonlinear transformation by
the activation function
3. The result will be multiplied by the Beta coefficients and passed to the output layer as
the linear combination of nonlinear transformed linear combination of x, with an
intercept.
Process: Estimating parameters
In neural network, Weights, Bias, Beta coefficients and intercept have to be estimated from
data. Loss functions are used.
Quantitative objective: minimize Squared-error loss
Qualitative objective: minimize the negative multinomial log-likelihood/cross-entropy