Save
Students also studied
Could Computers become Consciou... Applying Cognitive Science week 10: privacy preserving machin... Session
56 terms 49 terms 70 terms 19 terms
Caterina_Lincoln Preview Becca_Lloyd2 Preview used-by-my-school Preview sim
What is a multivariate set of data? A number of objects/samples are characterised by attributes or features. The
attributes/features and sample/points can also be considered as measurements
or observations and objects.
What are data streams? Data sets that are dynamically changing or evolving in time.
What is normalisation? If the measurement units of feature values in the columns of X lie within different
dynamic ranges, we need to normalise the data to make them comparable.
Usually done to interval [0,1] or [-1,1]
It's important because the clustering is based on proximity calculations and they
can be distorted if the data is not normalised.
What is the formula to normalise? x NORM = x(every value) - x(min) divided by range.
It requires the range (min and max) per feature to be known and fixed. Which is
difficult for online streams. By updating the normalised value each time the range
will change.
What is standardisation? An alternative transform which can be done online (for streams).
X = x - mean divided by standard deviation.
Mean can easily be updated online. This means standardisation is very convenient
for online data.
What is dimensionality reduction? The process of reducing the number of input variables. Through either feature
extraction or feature selection. Both simplify the amount of resources required to
describe a large set of data accurately.