Important terms
Chapter 1
An Algorithm: A sequence of steps designed to process data and generate insights
or predictions. It automates tasks like data analysis, classification, and pattern
recognition.
CRISP-DM (Cross Industry Standard Process for Data Mining): A structured
process for solving data mining problems, consisting of iterative steps from business
understanding to deployment.
Pre-processing: Steps taken to clean, transform, and prepare data for analysis,
including handling missing values, encoding categorical data, and treating outliers.
Data Science: A set of fundamental principles that guide the extraction of
knowledge from data.
Data Mining: The extraction of knowledge from data, using technologies that
incorporate the principles of data science.
Big Data: Data that is so large and complex that traditional data storage and
processing systems are inadequate.
Machine Learning (ML): A subset of AI techniques that allow systems to learn
and improve from experience without being explicitly programmed.
Deep Learning (DL): A subset of machine learning that uses neural networks with
many layers to analyse various factors of data.
Artificial Intelligence (AI): Techniques that allow machines to display intelligent
behaviour.
Supervised Learning: A type of machine learning where the model is trained on
labelled data, i.e., data that includes both input and the desired output.
Unsupervised Learning: A type of machine learning where the model is trained on
data without labelled responses and must find patterns and relationships in the data.
Reinforcement Learning: A type of machine learning where an agent learns to
make decisions by performing certain actions and receiving rewards or penalties.
Data as a Strategic Asset: The concept that data can lead to better decision-
making and is a valuable asset for businesses.
, Model: An abstract representation of reality in data science, often created through
machine learning algorithms based on data.
Training Data: Data used to train a machine learning model.
Testing Data: Data used to evaluate the performance of a trained machine
learning model.
Classification: A type of supervised learning where the output is a discrete
category, such as 'spam' or 'not spam'.
Regression: A type of supervised learning where the output is a continuous value,
such as predicting house prices.
Clustering: An unsupervised learning task that involves grouping similar data points
together.
Anomaly Detection: Identifying data points that do not fit the normal pattern of
the data.
Querying and Reporting: Techniques used to retrieve specific information from
databases, such as SQL queries.
OLAP (Online Analytical Processing): Tools used to analyse data from multiple
database systems at once, often used in business intelligence.
Chapter 1
An Algorithm: A sequence of steps designed to process data and generate insights
or predictions. It automates tasks like data analysis, classification, and pattern
recognition.
CRISP-DM (Cross Industry Standard Process for Data Mining): A structured
process for solving data mining problems, consisting of iterative steps from business
understanding to deployment.
Pre-processing: Steps taken to clean, transform, and prepare data for analysis,
including handling missing values, encoding categorical data, and treating outliers.
Data Science: A set of fundamental principles that guide the extraction of
knowledge from data.
Data Mining: The extraction of knowledge from data, using technologies that
incorporate the principles of data science.
Big Data: Data that is so large and complex that traditional data storage and
processing systems are inadequate.
Machine Learning (ML): A subset of AI techniques that allow systems to learn
and improve from experience without being explicitly programmed.
Deep Learning (DL): A subset of machine learning that uses neural networks with
many layers to analyse various factors of data.
Artificial Intelligence (AI): Techniques that allow machines to display intelligent
behaviour.
Supervised Learning: A type of machine learning where the model is trained on
labelled data, i.e., data that includes both input and the desired output.
Unsupervised Learning: A type of machine learning where the model is trained on
data without labelled responses and must find patterns and relationships in the data.
Reinforcement Learning: A type of machine learning where an agent learns to
make decisions by performing certain actions and receiving rewards or penalties.
Data as a Strategic Asset: The concept that data can lead to better decision-
making and is a valuable asset for businesses.
, Model: An abstract representation of reality in data science, often created through
machine learning algorithms based on data.
Training Data: Data used to train a machine learning model.
Testing Data: Data used to evaluate the performance of a trained machine
learning model.
Classification: A type of supervised learning where the output is a discrete
category, such as 'spam' or 'not spam'.
Regression: A type of supervised learning where the output is a continuous value,
such as predicting house prices.
Clustering: An unsupervised learning task that involves grouping similar data points
together.
Anomaly Detection: Identifying data points that do not fit the normal pattern of
the data.
Querying and Reporting: Techniques used to retrieve specific information from
databases, such as SQL queries.
OLAP (Online Analytical Processing): Tools used to analyse data from multiple
database systems at once, often used in business intelligence.