Table of contents
Lecture 1: introduction
Big data
Business decisions
Business analytics
Data mining
Cross Industry Standard Process for Data Mining (CRISP-DM) framework
Lecture 1: data visualization & preprocessing
Data understanding
Categorical data
Numerical data
Non-numerical data
Misleading visualizations
Data preparation
Data integration
Data cleaning
Data reduction
Data transformation
Lecture 2: supervised learning 1
Introduction to supervised learning
Classification models
K-nearest-neighbour classifier (KNN)
Naïve Bayes classifier
Decision trees
Classification performance measurement
Binary classification
Receiver Operating Characteristic (ROC) curve
Kappa coefficient
Regression models
Linear regression
Regression vs classification
Experimental setup
Lecture 3: supervised learning 2
Support Vector Machines (SVMs)
Non-linear SVMs
Bias-variance trade-off
1BM110 - course summary 1
, Ensemble methods
Bagging
Boosting
Unsupervised learning (clustering)
Clustering
K-means clustering
Hierarchical clustering
Applying clustering algorithms
Lecture 4: temporal data
Grouping sequences & mapping
Mapping methods
Dynamic Time Warping (DTW)
Response features
Markov chains
Maximum likelihood estimation
Association analysis
Lecture 5: neural networks & Deep Learning (DL)
Perceptron & sigmoid neuron
Multi-layer perceptron (multi-layer neural network)
Training neural networks
Gradient descent
Momentum
Regularization
Lectures 6 & 7: Natural Language processing (NLP)
Domain & corpus
Corpus
Pre-processing
Linguistic processing
Knowledge resources
Text representation
Bag-of-Words (BoW) model
n-grams
Linguistic features model vs BoW model
Distributional Semantic Models (DSM)
Supervised NLP tasks
Unsupervised NLP tasks
Lecture 8: eXplainable Artificial Intelligence (XAI)
Interpretability vs explanations
Transparency
White boxes (intrinsically interpretable models)
Model-agnostic explanation methods
Model-specific explanation methods (for DNN)
Evaluation & measures
Lecture 1: introduction
Big data
Volume: quantity of generated and stored data
Variety: type and nature of the data
1BM110 - course summary 2
, Velocity: speed at which the data is generated
and processed
Business decisions
Decision Support System (DSS): computerized program used to support determinations, judgments,
and courses of action in an organization or a business.
Convential decision support: emphasis on deduction.
Business Intelligence (BI): data-driven DSS; methods that facilitate decision-making by integrating
information and processes through tools that transform data into useful and actionable information.
Business intelligence: emphasis on induction.
Business analytics
Descriptive analytics: using data to understand past and current business performance.
Answers questions such as:
What has occurred?
How much did we sell in each region?
What type of customer returns products?
Techniques & methods: reporting, dashboards, summarization, visualization
Segmentation: clustering, associate rules
Predictive analytics: analyzes past performance in an effort to predict the future.
Answers questions such as:
What will occur?
How much will we sell in each region?
Techniques & methods:
Regression & classification
1BM110 - course summary 3
, Text mining
Prescriptive analytics: identifies the best alternatives to minimize or maximize some objective.
Answers questions such as:
What should occur?
How much should we produce to maximize profit?
Techniques & methods: mathematical optimization models, heuristics
Data mining
Data mining: identifying patterns in data.
Examples of data mining.
Real-world data mining:
Too much data → data might be polluted
Unclear which data attributes are important
Results do not make sense
Cross Industry Standard Process for Data Mining (CRISP-DM) framework
Steps in the CRISP-DM framework:
1. Business understanding
2. Data understanding
3. Data preparation
4. Modeling
5. Evaluation
6. Deployment
The CRISP-DM framework.
1BM110 - course summary 4