MNIST dataset - ANSWER 70,000 small images of digits handwritten by high
school students and employees of the US Census Bureau. Each image is labeled
with the digit it represents.
to fetch MNIST - ANSWER >>> from sklearn.datasets import fetch_openml
>>> mnist = fetch_openml('mnist_784', version=1)
>>>mnist.keys()dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details',
'categories', 'url'])
DESCR - ANSWER describing the dataset
data key - ANSWER contains an array with one row per instance and one
column per
feature
target key - ANSWER contains an array with the labels
features of MNIST - ANSWER 784 features; each feature represents one
pixel's intensity from white (0) to black (255) and there are 28x28 pixels in each
image
binary classifier (p. 90) - ANSWER 5s and not-5s = only 2 categories
Stochastic gradient descent (SGD) classifier - ANSWER capable of handling
very large datasets efficiently. This is in part because SGD deals with training
instances independently, one at a time (which also makes SGD well suited for
online learning)
Cross-validation - ANSWER Verifying the results obtained from a validation
study by administering a test or test battery to a different sample (drawn from
the same population)
school students and employees of the US Census Bureau. Each image is labeled
with the digit it represents.
to fetch MNIST - ANSWER >>> from sklearn.datasets import fetch_openml
>>> mnist = fetch_openml('mnist_784', version=1)
>>>mnist.keys()dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details',
'categories', 'url'])
DESCR - ANSWER describing the dataset
data key - ANSWER contains an array with one row per instance and one
column per
feature
target key - ANSWER contains an array with the labels
features of MNIST - ANSWER 784 features; each feature represents one
pixel's intensity from white (0) to black (255) and there are 28x28 pixels in each
image
binary classifier (p. 90) - ANSWER 5s and not-5s = only 2 categories
Stochastic gradient descent (SGD) classifier - ANSWER capable of handling
very large datasets efficiently. This is in part because SGD deals with training
instances independently, one at a time (which also makes SGD well suited for
online learning)
Cross-validation - ANSWER Verifying the results obtained from a validation
study by administering a test or test battery to a different sample (drawn from
the same population)