Describe a facts cube - ANS-Ø Multi-dimensional statistics version
• Dimensions: cube characteristic
• E.G., 12 months, product, color
• Facts: numeric degree
• E.G., income extent/cost
Describe a information warehouse - ANS-William H. Inmon -- "a subject-orientated,
incorporated, time-version, and nonvolatile collection of data in support of management's
choice-making method."
Describe ETL Staging - ANS-Ø Extract records from various data resources
Ø Transform facts
Ø Load information into the statistics warehouse
Describe statistics & dimensions in a datawarehouse - ANS-Examples:
Ø Fact: Sales
• Customer, object, time
Ø Dimension: Customer
• Name, deal with, DOB
Ø Dimension: Time
• Year, month, date
Describe OLTP & OLAP - ANS-Ø Online Transactional Processing (OLTP)
• Transaction-orientated tasks: financial institution transfer, buy, ...
• Daily operations: insert, update, delete
Ø Online Analytical Processing (OLAP)
• Complex queries on historical facts
• Data evaluation for insights and selection making
Describe the Data Modeling section - ANS-Ø Frequent sample analysis
Ø Classification, prediction
Ø Clustering
Ø Anomaly detection
Ø Trend and evolution analysis
Describe the Data Preprocessing segment - ANS-Ø Potential troubles with records
• E.G., missing records, errors, inconsistency
Ø Preparing information for the mining technique
• Data cleaning, integration, transformation, discount
Ø No suitable records, no properly information mining!
, Describe the Data Understanding segment - ANS-Ø What kinds of facts?
Ø What do they seem like?
Ø Statistics & visualization
Ø Similarity vs. Dissimilarity
Ø General patterns vs. Anomalies
Describe the Data Warehousing section - ANS-Ø Data warehouse
• vs. Operational statistics
Ø Data dice & OLAP
• Multi-dimensional facts management
Ø Data warehouse structure
Describe the Pattern Evaluation section - ANS-Ø Finding interesting styles from data
• New, legitimate, generalizable, useful, explainable
Ø Evaluation metrics
• Accuracy, blunders price
• False wonderful/poor fee
• Efficiency, latency, ...
Ø Model selection
Does correlation mean causality? - ANS-• napping with one's shoes on is strongly correlated
with waking up with a headache
• the more fireman combating a damage, the greater harm there may be going to be
• as ice cream sales will increase, the charge of drowning deaths increases sharply
• Correlation does NOT mean causality!
What are a few methods to encoding relationships among nominal attributes - ANS-Ø Similarity
• s = 1 if x = y; otherwise s = 0
Ø Dissimilarity
• d = zero if x = y; otherwise d = 1 Ø Customized
• E.G., color: white is greater similar to silver than pink
What are some common schemas for data warehousing - ANS-Ø Star schema: one reality
table, multiple size tables
Ø Snowflake schema
• one truth table, multiple ranges of size tables
Ø Fact constellation schema
• more than one truth tables, shared dimension tables
What are some information dice operations? - ANS-Ø Roll up: aggregation
• E.G., day by day => monthly
Ø Drill down: opposite of roll up
• E.G., North America => USA, Mexico, Canada, ...