Machine Learning
AI: methods for improving the knowledge or performance of an intelligent agent over time, in
response to the agent's experience in the world
Link
• A computer interacts through data
o Learning from data leads to intelligence
▪ Big Data + Machine Learning = Artificial Intelligence
Machine Learning = automatic extraction of knowledge (patterns) from data
Data science = a set of fundamental principles that guide the extraction of knowledge from
data
Note: an initial set of data instances with known target variable needed!
• xij : the value of the input variable j for data instance i
o data instance xi : input vector n (row)
info of a user at a given moment
o input variable or feature xj : input vector m (column)
data variables : click patterns, IP address ..
Example:
predict personality likes through fb likes
We can investigate the variables (likes) with highest and lowest coefficient in the linear
model.
Predict political preference
Input features (X) = words found in the tweet, such as "American", "come together", "stronger",
etc. Each tweet is turned into a vector, where each entry represents whether a specific word is
present or absent (1 or 0).
Training the Model (Learning Coefficients):
Coefficients (β): The linear model assigns weights to each word, reflecting its importance in
classifying a tweet as pro-Biden or pro-Trump.
After learning othe coefficients, the model can be used to predict how new people will vote
based on the presence f words in their tweets.
,Prediction: The sum of the product of input features and their coefficients yields a
prediction, which the model uses to classify the tweet. If the predicted value is greater than or
equal to 0, the model predicts one class (e.g., Biden). If it is less than 0, the model predicts the
other class (e.g., Trump).
Deep Learning : involves large ANN artificial neural networks that mimic the way the human
brain works
The end-user is still the engine of discovery
regardless of the tools and methods used, the person conducting the analysis (the end-user)
plays the crucial role in interpreting the data and deriving insights.!
Querying and reporting : You know exactly what you are looking for.
Visualization : multidimensional analysis
Business Intelligence = Getting the right information to the right person at the right time
Data Warehousing (collect & store data from multiple sources) :
• Reporting : you know exactly what you are looking for
• Machine learning : you don’t know what you look for! looking for new knowledge
Machine Learning Process : CRISP DM
= science + craft + creativity + common sense
CRISP-DM: Cross Industry Standard Process for Data Mining
1. Business Understanding: understand business problem + define objective
2. Data Understanding: data is collected, explored, to gain insights. ensure that the data
supports the business objectives
3. Data Preparation: cleaning, transforming, and formatting the data to get it ready for
modeling.
4. Modeling: Apply machine learning algorithms to build models using the prepared data.
5. Evaluation: Check performance
6. Deployment: Implement the model in the real world to solve the problem and monitor its
performance.
,DDDM Data Driven Decision Making
how data science supports decision-making in organizations.
Data Engineering : processes and manages data (including big data)
Data Science : analyzes this data, creating insights
These insights drive DDDM across the organization
Result : Faster processing and better decisions
Importance
• Combination of business knowledge and data science skills is highly valued
• Does every data scientist need 10 managers?! The demand for more managers reflects
the need for leaders who can effectively interpret and apply data insights to their
specific business contexts, not to manage data scientists directly
Firms that adopt DDDM have
• Productivity that is 5-6% higher
• Higher ROE
• Higher market value
Data Science Roles & Tasks
Data Architect = designing and implementing data storage, such as databases
Data Analyst = analyzing data to generate insights
Data Scientist = works on predictive analytics and models data to extract insights
using ML algorithms
Data Engineer = Provide the technical toolboxes that a data scientist needs access
to. skills, that can be used for storing, accessing, visualising data.
Machine Learning Engineer = Specializes in building and deploying ML models,
focusing on automation and scalability.
, case
churn
Gather data (demographics, payment history, customer support interactions, etc).
Identify patterns and factors associated with customers who are likely to leave,
And based on that, make churn management strategies
Fiscal fraud detection :
ML algorithms (e.g., decision trees, anomaly detection) can
analyze financial transactions & tax filings
to detect irregularities : such as under-reported income or suspicious deductions,
by comparing patterns with known fraud cases.
Social fraud detection :
ML models can analyze
social benefit claims
to identify inconsistencies: fraudulent claims for unemployment or disability benefits
and flagging anomalies.
Energy fraud detection :
Algorithms can monitor energy usage patterns
for signs of tampering (e.g., meter manipulation)m detect sudden drops or spikes in
usage that deviate from typical consumption trends in similar households
privacy
FAT flow !
Fair (privacy, discrimination)
Accountable
Transparant
Sensitive
“Hey, you’re having a baby!” Target
they target you with toy/pamper ads etc.
And beyond: data science ethics (importance of ethics)
Explaining versus Predicting
Steps of Explanatory modeling THVMP
1. Causal theory
2. Generate hypotheses based on constructs
3. Operationalize constructs in measurable variables
Find ways to measure each factor
4. Fit statistical model
statistical methods to assess the correlation between variables
AI: methods for improving the knowledge or performance of an intelligent agent over time, in
response to the agent's experience in the world
Link
• A computer interacts through data
o Learning from data leads to intelligence
▪ Big Data + Machine Learning = Artificial Intelligence
Machine Learning = automatic extraction of knowledge (patterns) from data
Data science = a set of fundamental principles that guide the extraction of knowledge from
data
Note: an initial set of data instances with known target variable needed!
• xij : the value of the input variable j for data instance i
o data instance xi : input vector n (row)
info of a user at a given moment
o input variable or feature xj : input vector m (column)
data variables : click patterns, IP address ..
Example:
predict personality likes through fb likes
We can investigate the variables (likes) with highest and lowest coefficient in the linear
model.
Predict political preference
Input features (X) = words found in the tweet, such as "American", "come together", "stronger",
etc. Each tweet is turned into a vector, where each entry represents whether a specific word is
present or absent (1 or 0).
Training the Model (Learning Coefficients):
Coefficients (β): The linear model assigns weights to each word, reflecting its importance in
classifying a tweet as pro-Biden or pro-Trump.
After learning othe coefficients, the model can be used to predict how new people will vote
based on the presence f words in their tweets.
,Prediction: The sum of the product of input features and their coefficients yields a
prediction, which the model uses to classify the tweet. If the predicted value is greater than or
equal to 0, the model predicts one class (e.g., Biden). If it is less than 0, the model predicts the
other class (e.g., Trump).
Deep Learning : involves large ANN artificial neural networks that mimic the way the human
brain works
The end-user is still the engine of discovery
regardless of the tools and methods used, the person conducting the analysis (the end-user)
plays the crucial role in interpreting the data and deriving insights.!
Querying and reporting : You know exactly what you are looking for.
Visualization : multidimensional analysis
Business Intelligence = Getting the right information to the right person at the right time
Data Warehousing (collect & store data from multiple sources) :
• Reporting : you know exactly what you are looking for
• Machine learning : you don’t know what you look for! looking for new knowledge
Machine Learning Process : CRISP DM
= science + craft + creativity + common sense
CRISP-DM: Cross Industry Standard Process for Data Mining
1. Business Understanding: understand business problem + define objective
2. Data Understanding: data is collected, explored, to gain insights. ensure that the data
supports the business objectives
3. Data Preparation: cleaning, transforming, and formatting the data to get it ready for
modeling.
4. Modeling: Apply machine learning algorithms to build models using the prepared data.
5. Evaluation: Check performance
6. Deployment: Implement the model in the real world to solve the problem and monitor its
performance.
,DDDM Data Driven Decision Making
how data science supports decision-making in organizations.
Data Engineering : processes and manages data (including big data)
Data Science : analyzes this data, creating insights
These insights drive DDDM across the organization
Result : Faster processing and better decisions
Importance
• Combination of business knowledge and data science skills is highly valued
• Does every data scientist need 10 managers?! The demand for more managers reflects
the need for leaders who can effectively interpret and apply data insights to their
specific business contexts, not to manage data scientists directly
Firms that adopt DDDM have
• Productivity that is 5-6% higher
• Higher ROE
• Higher market value
Data Science Roles & Tasks
Data Architect = designing and implementing data storage, such as databases
Data Analyst = analyzing data to generate insights
Data Scientist = works on predictive analytics and models data to extract insights
using ML algorithms
Data Engineer = Provide the technical toolboxes that a data scientist needs access
to. skills, that can be used for storing, accessing, visualising data.
Machine Learning Engineer = Specializes in building and deploying ML models,
focusing on automation and scalability.
, case
churn
Gather data (demographics, payment history, customer support interactions, etc).
Identify patterns and factors associated with customers who are likely to leave,
And based on that, make churn management strategies
Fiscal fraud detection :
ML algorithms (e.g., decision trees, anomaly detection) can
analyze financial transactions & tax filings
to detect irregularities : such as under-reported income or suspicious deductions,
by comparing patterns with known fraud cases.
Social fraud detection :
ML models can analyze
social benefit claims
to identify inconsistencies: fraudulent claims for unemployment or disability benefits
and flagging anomalies.
Energy fraud detection :
Algorithms can monitor energy usage patterns
for signs of tampering (e.g., meter manipulation)m detect sudden drops or spikes in
usage that deviate from typical consumption trends in similar households
privacy
FAT flow !
Fair (privacy, discrimination)
Accountable
Transparant
Sensitive
“Hey, you’re having a baby!” Target
they target you with toy/pamper ads etc.
And beyond: data science ethics (importance of ethics)
Explaining versus Predicting
Steps of Explanatory modeling THVMP
1. Causal theory
2. Generate hypotheses based on constructs
3. Operationalize constructs in measurable variables
Find ways to measure each factor
4. Fit statistical model
statistical methods to assess the correlation between variables