Anomaly Detection - ANS-Includes anomalies or outliers (e.G. Mistakes, noise, fraud, excessive
events)
Anomaly, outliers (Knowledge View) - ANS-E.G., sensor errors, fraud sports, severe occasions
Applications of cosine similarity? - ANS-Frequency of word occurrence in text files,
High-Dimensional or sparse information
Asymmetric Binary Attribute - ANS-Y is much less in all likelihood than N
Asymmetric Variables Equation - ANS-(q / (q+r+s) or 1 - d(i,j)) = sim(i,j) or Jaccard coefficient ;
d(i,j) = (r + s)/(q + r + s)
Benefits of statistics mining - ANS-Scalability and efficiency
Binary Asymmetry - ANS-Y is less likely than N
Binary Symmetry - ANS-Equal hazard of Y or N
Bottom-Up Computation - ANS-Top-down computation, Iceberg pruning (Divide dimensions into
partitions which might be above threshold)
Categorization (Knowledge View) - ANS-E.G., Similarity among person with positive purchases,
differences between patient agencies
Changes over the years (Knowledge View) - ANS-E.G., rising new patterns, shift of user interest
Classification - ANS-Includes pre-described lessons, education facts, and distinguishable
classes
Cloud-based totally Data Warehousing - ANS-Has scalability and elasticity (e.G. Snowflake
structure)
Clustering - ANS-Includes no pre-described instructions, intra-cluster similarity, inter-cluster
dissimilarity
Correlation for nominal attributes - ANS-eij = (count(A=ai) * matterstatistics structure used to
save and control statistics in a multidimensional DBMS. The area of every statistics fee within
, the records dice is based on its x-, y-, and z-axes. Data cubes are static, which means they
have to be created before they're used, so that they can't be created by way of an advert hoc
question. Dimensions (cube attribute), Facts (numeric measure)
Data Cube Operations - ANS-Roll-Up (Aggregation), Drill down(opposite roll-up), Pivot (rotate
visualization), Slicing (pick along a unmarried dimension), Dicing (pick amongst all dimesions)
Data Modeling - ANS-Step that entails the 5 method perspectives of data mining
information preprocessing - ANS-Preparing the facts for the mining process, consists of the
following operations: Data Integration, Data Transformation, Data Reduction, Data Cleaning
Data Understanding - ANS-Answering questions like: What varieties of information? What do
they seem like?
Includes facts and visualization
observes similarity vs dissimilarity
Data Warehouse functions - ANS-Contains raw, meta, and summary information. Has
information marts (subsets with unique focuses). Supports evaluation, reviews, facts mining
Data Warehousing - ANS-the gathering, garage, and retrieval of statistics in electronic files.
Includes operational records. Can involve a information cube and OLAP
Distance Measure Properties - ANS-d(i,j) <= d(i,k) + d(k,j), triangular inequality
Does correlation imply causality? - ANS-no
Fact Constellation Schema - ANS-Fact (Sales), Dimension (Item), Dimension (Shipping)
Feature engineering - ANS-The process of determining which features might be useful in
training a model, and then converting raw data from log files and other sources into said
features. In TensorFlow, feature engineering often means converting raw log file entries to
tf.Example protocol buffers. See also tf.Transform. Also sometimes called feature extraction.
Formula for mean normalization - ANS-v' = (v-mean)/(max-min)
Formula for min-max normalization - ANS-v' = (v-min)/(max-min) * (max' - min') + min'
Formula for standardized normalization - ANS-v' = (v-mean)/stdev
Frequent pattern , correlation (Knowledge View) - ANS-E.G., Songs listened together or in
certain sequence