Answers|2026 Update
Modeling Error
Given a particular NN architecture, the actual model that represents the real world may not be in that
space.
When model complexity increases, modeling error reduces, but optimization error increases.
Estimation Error
Even if finding the best hypothesis, weights, and parameters that minimize training error, may not
generalize to test set
Optimization Error
Even if your NN can perfectly model the world, your algo may not find good weights that model the
function.
When model complexity increases, modeling error reduces, but optimization error increases.
Effectiveness of transfer learning under certain conditions
Remove last FC layer of CNN and initialize it randomly, then run new data through network to train only
that layer
In order to train the NN for transfer learning -freeze the CNN layers or early layers and learn parameters
in the FC layers.
Performs very well on very small amount of training, if similar to the original data
Does not work very well if the target task's dataset is very different
If you have enough data in the target domain, and is different than the source, better to just train on the
new data
Transfer learning = reuse features we learn on a very large dataset on a completely new thing
Steps:
Train on very large dataset
Take custom dataset and initialize network with weights trained in Step 1 (replace last fully connected
layer since classes in new network will be different)
Final step -> continue training on new dataset
Can either retrain all weights ("finetune") or freeze (ie: not update) weights in certain layers (freezing
reduces number of parameters that you need to learn)
AlexNet
, 2x(CONV=>MAXPOOL=>NORM)=>3xCONV=>MAXPOOL=>3xFC
ReLU, specialized normalization layers, PCA-based data augmentation, Dropout, Ensembling (used 7 NN
with different random weights)
Critical development: More depth and ReLU
VGGNet
2x(2xCONV=>POOL)=>3x(3xCONV=>POOL)=>3xFC
Repeated Application of 3x3 Conv (stride of 1, padding of) & 2x2 Max Pooling (stride 2) blocks
Very large number of parameters (most in FC) layers, most memory in Conv Layers (you are storing
activation produced in forward pass)
Critical Development: Blocks of repeated structures
Inception Net
Deeper and more complex than VGGNet
Average Pooling before FC Layer
Repeated blocks that are repeated over again to form NN
Blocks are made of simple layers, FC, Conv, MaxPool, and softmax
Parallel filters of different sizes to get features at multiple scales
Critical Development: Blocks of parallel paths
Uses Network In Network concept i.e 1x1 Convolution -sort of Dimensionality reduction see slide
Negative things: Increased Computational Work
ResNet
Allow information from a layer to propagate to a future layer
Passes residuals of a layer at depth x and adds it to the output of the layer at x+1
Averaging block at end
Critical Development: Passing residuals of previous layers forward
Convolutional layers and how they work (forward/backward)
https://www.youtube.com/watch?v=Lakz2MoHy6o&t=1299s
(Don't have a good short summary)
Equivariance
If the input changes, the output changes in the same way if f(g(x) =g(f(x). If the beak of a bird in a picture
moves a bit, the output values will move in the same way
change to input causes equal change to output
Invariance
If the input changes, the output stays the same. That is f(g(x)) = f(x) E.g. rotating/scaling a number will
still result in that number being classified the same..
change to input does not affect output