📊
Data Science and Society
Created @October 30, 2024 10:10 AM
Class INFOMDSS
Part A: Data Science & Processes,
Analytics
Data science involves analyzing data through specific frameworks and tools,
which shape perceptions of reality while presenting challenges related to time
and potential pitfalls from improper tool selection.
e.g. Twitter as a data source during a flood case: bias may arise, no
electricity means no way to communicate.
e.g. Using statistics to argue the odds of disease for baby case: incorrect
assumption of independence
e.g. Recidivist assessment case: bias towards woman and black people,
bias/unfairness
Data Science and Society 1
, Data science covers a wide range of tasks and models, including collecting
data, deployment of models and business understanding. Data science models
and insights affect individuals and society, and therefore data scientists should
be aware of these risks.
Analytics types
Descriptive analytics: answering the question of what happend, a
retrospective analysis of historic data. Done using data visualisation,
dashboards, statistics.
Predictive analytics: what is likely to happen in the future? Looking at past
data to predict the future. Done by using data mining, text mining, forecasting.
Prescriptive analytics: aims to determine the best possible decision based on
the data. Uses descriptive and predictive to create alternatives, and determines
the best one. Done by using optimization, simulation, heuristic programming.
Data Science and Society 2
, Business understanding
ML DevOps: the practice of integrating machine learning model development
with DevOps principles to streamline the deployment, monitoring, and
management of models in production, ensuring they remain efficient, reliable,
and scalable.
CRISP-DM: Cross Industry Standard Process for Data Mining
Methodology for structuring and managing data mining and data science
projects.
Business Understanding: Clearly define the project’s objectives from a
business perspective. Identifying the business problem, understanding
what the organization needs to achieve, and translating that into a data
science goal.
Data Understanding: Collect and assess the quality and characteristics of
the data. Gathering data, exploring it to discover initial insights, and
identifying any quality issues or patterns relevant to the business goal.
Data Science and Society 3
, Data Preparation: Clean, transform, and structure the data for analysis.
Involves selecting the relevant data, handling missing values, removing
noise, creating new variables (feature engineering), and preparing datasets
that are ready for modeling.
Modeling: Develop models using appropriate techniques. Modeling
techniques (e.g., regression, classification, clustering) are applied to the
data. It may involve trying out different algorithms, tuning parameters, and
selecting the best models based on performance metrics.
Evaluation: Evaluate the model and ensure it meets business objectives.
Evaluating in terms of both its accuracy and its relevance to the business
objectives. Ensures that the model’s performance is aligned with the
business problem and is not just technically good.
Deployment: Implement the model in the real-world environment. Involves
automating the model, integrating it into software systems, or presenting
results through reports or dashboard
Data Science and Society 4
Data Science and Society
Created @October 30, 2024 10:10 AM
Class INFOMDSS
Part A: Data Science & Processes,
Analytics
Data science involves analyzing data through specific frameworks and tools,
which shape perceptions of reality while presenting challenges related to time
and potential pitfalls from improper tool selection.
e.g. Twitter as a data source during a flood case: bias may arise, no
electricity means no way to communicate.
e.g. Using statistics to argue the odds of disease for baby case: incorrect
assumption of independence
e.g. Recidivist assessment case: bias towards woman and black people,
bias/unfairness
Data Science and Society 1
, Data science covers a wide range of tasks and models, including collecting
data, deployment of models and business understanding. Data science models
and insights affect individuals and society, and therefore data scientists should
be aware of these risks.
Analytics types
Descriptive analytics: answering the question of what happend, a
retrospective analysis of historic data. Done using data visualisation,
dashboards, statistics.
Predictive analytics: what is likely to happen in the future? Looking at past
data to predict the future. Done by using data mining, text mining, forecasting.
Prescriptive analytics: aims to determine the best possible decision based on
the data. Uses descriptive and predictive to create alternatives, and determines
the best one. Done by using optimization, simulation, heuristic programming.
Data Science and Society 2
, Business understanding
ML DevOps: the practice of integrating machine learning model development
with DevOps principles to streamline the deployment, monitoring, and
management of models in production, ensuring they remain efficient, reliable,
and scalable.
CRISP-DM: Cross Industry Standard Process for Data Mining
Methodology for structuring and managing data mining and data science
projects.
Business Understanding: Clearly define the project’s objectives from a
business perspective. Identifying the business problem, understanding
what the organization needs to achieve, and translating that into a data
science goal.
Data Understanding: Collect and assess the quality and characteristics of
the data. Gathering data, exploring it to discover initial insights, and
identifying any quality issues or patterns relevant to the business goal.
Data Science and Society 3
, Data Preparation: Clean, transform, and structure the data for analysis.
Involves selecting the relevant data, handling missing values, removing
noise, creating new variables (feature engineering), and preparing datasets
that are ready for modeling.
Modeling: Develop models using appropriate techniques. Modeling
techniques (e.g., regression, classification, clustering) are applied to the
data. It may involve trying out different algorithms, tuning parameters, and
selecting the best models based on performance metrics.
Evaluation: Evaluate the model and ensure it meets business objectives.
Evaluating in terms of both its accuracy and its relevance to the business
objectives. Ensures that the model’s performance is aligned with the
business problem and is not just technically good.
Deployment: Implement the model in the real-world environment. Involves
automating the model, integrating it into software systems, or presenting
results through reports or dashboard
Data Science and Society 4