CS 7643 Quiz 1,2 & 3 Questions and Answers Latest 2025/2026 Guide Rated A- Georgia Institute of
Technology
Modeling Error - (ANSWER)Given a particular NN architecture, the actual model that represents the real
world may not be in that space.
When model complexity increases, modeling error reduces, but optimization error increases.
Estimation Error - (ANSWER)Even if finding the best hypothesis, weights, and parameters that minimize
training error, may not generalize to test set
Optimization Error - (ANSWER)Even if your NN can perfectly model the world, your algo may not find
good weights that model the function.
When model complexity increases, modeling error reduces, but optimization error increases.
Effectiveness of transfer learning under certain conditions - (ANSWER)Remove last FC layer of CNN and
initialize it randomly, then run new data through network to train only that layer
In order to train the NN for transfer learning -freeze the CNN layers or early layers and learn parameters
in the FC layers.
Performs very well on very small amount of training, if similar to the original data
Does not work very well if the target task's dataset is very different
If you have enough data in the target domain, and is different than the source, better to just train on the
new data
Transfer learning = reuse features we learn on a very large dataset on a completely new thing
Steps:
Train on very large dataset
Take custom dataset and initialize network with weights trained in Step 1 (replace last fully connected
layer since classes in new network will be different)
Final step -> continue training on new dataset
, CS 7643 Quiz 1,2 & 3 Questions and Answers Latest 2025/2026 Guide Rated A- Georgia Institute of
Technology
Can either retrain all weights ("finetune") or freeze (ie: not update) weights in certain layers (freezing
reduces number of parameters that you need to learn)
AlexNet - (ANSWER)2x(CONV=>MAXPOOL=>NORM)=>3xCONV=>MAXPOOL=>3xFC
ReLU, specialized normalization layers, PCA-based data augmentation, Dropout, Ensembling (used 7 NN
with different random weights)
Critical development: More depth and ReLU
VGGNet - (ANSWER)2x(2xCONV=>POOL)=>3x(3xCONV=>POOL)=>3xFC
Repeated Application of 3x3 Conv (stride of 1, padding of) & 2x2 Max Pooling (stride 2) blocks
Very large number of parameters (most in FC) layers, most memory in Conv Layers (you are storing
activation produced in forward pass)
Critical Development: Blocks of repeated structures
Inception Net - (ANSWER)Deeper and more complex than VGGNet
Average Pooling before FC Layer
Repeated blocks that are repeated over again to form NN
Blocks are made of simple layers, FC, Conv, MaxPool, and softmax
Parallel filters of different sizes to get features at multiple scales
Critical Development: Blocks of parallel paths
Uses Network In Network concept i.e 1x1 Convolution -sort of Dimensionality reduction see slide
Negative things: Increased Computational Work
ResNet - (ANSWER)Allow information from a layer to propagate to a future layer
Passes residuals of a layer at depth x and adds it to the output of the layer at x+1
Averaging block at end
Critical Development: Passing residuals of previous layers forward
Technology
Modeling Error - (ANSWER)Given a particular NN architecture, the actual model that represents the real
world may not be in that space.
When model complexity increases, modeling error reduces, but optimization error increases.
Estimation Error - (ANSWER)Even if finding the best hypothesis, weights, and parameters that minimize
training error, may not generalize to test set
Optimization Error - (ANSWER)Even if your NN can perfectly model the world, your algo may not find
good weights that model the function.
When model complexity increases, modeling error reduces, but optimization error increases.
Effectiveness of transfer learning under certain conditions - (ANSWER)Remove last FC layer of CNN and
initialize it randomly, then run new data through network to train only that layer
In order to train the NN for transfer learning -freeze the CNN layers or early layers and learn parameters
in the FC layers.
Performs very well on very small amount of training, if similar to the original data
Does not work very well if the target task's dataset is very different
If you have enough data in the target domain, and is different than the source, better to just train on the
new data
Transfer learning = reuse features we learn on a very large dataset on a completely new thing
Steps:
Train on very large dataset
Take custom dataset and initialize network with weights trained in Step 1 (replace last fully connected
layer since classes in new network will be different)
Final step -> continue training on new dataset
, CS 7643 Quiz 1,2 & 3 Questions and Answers Latest 2025/2026 Guide Rated A- Georgia Institute of
Technology
Can either retrain all weights ("finetune") or freeze (ie: not update) weights in certain layers (freezing
reduces number of parameters that you need to learn)
AlexNet - (ANSWER)2x(CONV=>MAXPOOL=>NORM)=>3xCONV=>MAXPOOL=>3xFC
ReLU, specialized normalization layers, PCA-based data augmentation, Dropout, Ensembling (used 7 NN
with different random weights)
Critical development: More depth and ReLU
VGGNet - (ANSWER)2x(2xCONV=>POOL)=>3x(3xCONV=>POOL)=>3xFC
Repeated Application of 3x3 Conv (stride of 1, padding of) & 2x2 Max Pooling (stride 2) blocks
Very large number of parameters (most in FC) layers, most memory in Conv Layers (you are storing
activation produced in forward pass)
Critical Development: Blocks of repeated structures
Inception Net - (ANSWER)Deeper and more complex than VGGNet
Average Pooling before FC Layer
Repeated blocks that are repeated over again to form NN
Blocks are made of simple layers, FC, Conv, MaxPool, and softmax
Parallel filters of different sizes to get features at multiple scales
Critical Development: Blocks of parallel paths
Uses Network In Network concept i.e 1x1 Convolution -sort of Dimensionality reduction see slide
Negative things: Increased Computational Work
ResNet - (ANSWER)Allow information from a layer to propagate to a future layer
Passes residuals of a layer at depth x and adds it to the output of the layer at x+1
Averaging block at end
Critical Development: Passing residuals of previous layers forward