Neural Networks
Learning from experience (Machine Learning): i) No formal rules of transformations ii)
No ‘knowledge base’ iii) No logical inference
- Hierarchical relationships: build complicated concepts out of simpler concepts
- Representation: what information computer is given about situation (per layer)
o Complex outcomes emerge from interactions between many simple steps
Representations move from visible layer → hidden (abstract) layers → object classification
- Feature: each piece of input information (what 𝑓(𝑥) transformation NN catches up)
Machine learning: computer program is said to learn from experience 𝐸 with respect to
some class of tasks 𝑇 and performance measure 𝑃 if its performance at tasks in 𝑇, as measured
by 𝑃, improves with experience 𝐸
- Task: ’classify which emails are wanted (not spam) vs unwanted (spam)’
- Experience: watching humans labels emails (training set)
- Performance: the proportion of new emails (test set) classified correctly
Deep Network (brain/machine): learning network transforming/extracting features:
- Multiple nonlinear processing units
- Arranged in multiple layers with
o Hierarchical organisation + Different levels of representation and abstraction
Necessary for object recognition, difficult because: viewpoints, sizes, positions, lighting
- Generalisation: not recognising specific features, instead generalise objects
- Hidden layer: respond to whatever transformation of features is optimal for deriving
the object identity (unconceptualisable, but weights in matrices etc.)
- Non-linear functions: 𝑓(𝑥) ≠ 𝐴 ∙ 𝑥 + 𝐵, more flexible, yet sensitive to overfitting
o Filter, threshold, pool, normalise → because weak relation from image to label
1
, Filter (convolution, linear): matrix multiplication, this filter is multiplied by a group of input
pixels with a particular position
o Filters could be manual (often first layer) or computer learned (abstract layers)
- If source pixels follow filter pattern (light on the right, dark on the left) → high value
- If input area is all same brightness → zero value
- If source pixels are opposite to filter → negative
Feature maps: large set of filters apply parallel to produce multiple maps
- Each pixel in feature maps abstraction of the pattern across a group of pixels
o Pixel in feature map represents activity of a processing unit or artificial ‘neuron’
Threshold (rectification, non-linear): only activate the output feature map if its value reaches
a certain level (if filters have mean 0, threshold output is typically 0), sigmoid is alternative
- Rectified linear unit (ReLU)
Pool (non-linear): optional after filtering because neighbouring units are representing very
similar information → down samples units to improve computational efficiency (data loss)
- Max operation (e.g., 2x2 neighbouring units of feature map) → like stride in convolve
2
Learning from experience (Machine Learning): i) No formal rules of transformations ii)
No ‘knowledge base’ iii) No logical inference
- Hierarchical relationships: build complicated concepts out of simpler concepts
- Representation: what information computer is given about situation (per layer)
o Complex outcomes emerge from interactions between many simple steps
Representations move from visible layer → hidden (abstract) layers → object classification
- Feature: each piece of input information (what 𝑓(𝑥) transformation NN catches up)
Machine learning: computer program is said to learn from experience 𝐸 with respect to
some class of tasks 𝑇 and performance measure 𝑃 if its performance at tasks in 𝑇, as measured
by 𝑃, improves with experience 𝐸
- Task: ’classify which emails are wanted (not spam) vs unwanted (spam)’
- Experience: watching humans labels emails (training set)
- Performance: the proportion of new emails (test set) classified correctly
Deep Network (brain/machine): learning network transforming/extracting features:
- Multiple nonlinear processing units
- Arranged in multiple layers with
o Hierarchical organisation + Different levels of representation and abstraction
Necessary for object recognition, difficult because: viewpoints, sizes, positions, lighting
- Generalisation: not recognising specific features, instead generalise objects
- Hidden layer: respond to whatever transformation of features is optimal for deriving
the object identity (unconceptualisable, but weights in matrices etc.)
- Non-linear functions: 𝑓(𝑥) ≠ 𝐴 ∙ 𝑥 + 𝐵, more flexible, yet sensitive to overfitting
o Filter, threshold, pool, normalise → because weak relation from image to label
1
, Filter (convolution, linear): matrix multiplication, this filter is multiplied by a group of input
pixels with a particular position
o Filters could be manual (often first layer) or computer learned (abstract layers)
- If source pixels follow filter pattern (light on the right, dark on the left) → high value
- If input area is all same brightness → zero value
- If source pixels are opposite to filter → negative
Feature maps: large set of filters apply parallel to produce multiple maps
- Each pixel in feature maps abstraction of the pattern across a group of pixels
o Pixel in feature map represents activity of a processing unit or artificial ‘neuron’
Threshold (rectification, non-linear): only activate the output feature map if its value reaches
a certain level (if filters have mean 0, threshold output is typically 0), sigmoid is alternative
- Rectified linear unit (ReLU)
Pool (non-linear): optional after filtering because neighbouring units are representing very
similar information → down samples units to improve computational efficiency (data loss)
- Max operation (e.g., 2x2 neighbouring units of feature map) → like stride in convolve
2