"Batch" Gradient Descent (BGD or GD) Correct Answers Each
step of gradient descent uses all the training examples. batch GD
- This is different from (SGD - stochastic gradient descent or
MB-GD - mini batch gradient descent)
In GD optimization, we compute the cost gradient based on the
complete training set; hence, we sometimes also call it batch
GD. In case of very large datasets, using GD can be quite costly
since we are only taking a single step for one pass over the
training set -- thus, the larger the training set, the slower our
algorithm updates the weights and the longer it may take until it
converges to the global cost minimum (note that the SSE cost
function is convex).
3D Surface Plot - how can it be used to plot the cost function?
Correct Answers Theta 0 and Theta 1 in a univariate linear
regression can be plotted on the x and y axes. the Z axis will
indicate the actual cost
,Classifier? Correct Answers Classifier: A classifier is a special
case of a hypothesis (nowadays, often learned by a machine
learning algorithm). A classifier is a hypothesis or discrete-
valued function that is used to assign (categorical) class labels to
particular data points. In the email classification example, this
classifier could be a hypothesis for labeling emails as spam or
non-spam. However, a hypothesis must not necessarily be
synonymous to a classifier. In a different application, our
hypothesis could be a function for mapping study time and
educational backgrounds of students to their future SAT scores.
Clustering Correct Answers a method of unsupervised learning
- a good way of discovering unknown relationships in datasets.
Cluster analysis or clustering is the task of grouping a set of
objects in such a way that objects in the same group (called a
cluster) are more similar (in some sense or another) to each
other than to those in other groups (clusters). It is a main task of
exploratory data mining, and a common technique for statistical
data analysis, used in many fields, including machine learning,
pattern recognition, image analysis, information retrieval,
bioinformatics, data compression, and computer graphics.
Cluster analysis itself is not one specific algorithm, but the
general task to be solved. It can be achieved by various
algorithms that differ significantly in their notion of what
constitutes a cluster and how to efficiently find them. Popular
notions of clusters include groups with small distances among
the cluster members, dense areas of the data space, intervals or
particular statistical distributions. Clustering can therefore be
formulated as a multi-objective optimization problem. The
, appropriate clustering algorithm and parameter settings
(including values such as the distance function to use, a density
threshold or the number of expected clusters) depend on the
individual data set and intended use of the results. Cluster
analysis as such is not an automatic task, but an iterative process
of knowledge discovery or interactive multi-objective
optimization that involves trial and failure. It is often necessary
to modify data preprocessing and model parameters until the
result achieves the desired properties.
Cocktail party effect/problem Correct Answers The cocktail
party effect is the phenomenon of being able to focus one's
auditory attention on a particular stimulus while filtering out a
range of other stimuli, much the same way that a partygoer can
focus on a single conversation in a noisy room.
Example of source separation.
Convex functions? Correct Answers In mathematics, a real-
valued function defined on an interval is called convex (or
convex downward or concave upward) if the line segment
between any two points on the graph of the function lies above
or on the graph, in a Euclidean space (or more generally a vector
space) of at least two dimensions. Equivalently, a function is
convex if its epigraph (the set of points on or above the graph of
the function) is a convex set. Well-known examples of convex
functions include the quadratic function {\displaystyle x^{2}}
x^{2} and the exponential function {\displaystyle e^{x}} e^{x}
for any real number x.