Introduction
Data science is a broader field of study concerned with turning data into knowledge
Data science = “the art and science of acquiring knowledge through data.”
When we understand how to apply data to achieve our goals, we turn it into knowledge
Key concept:
Data = raw facts
Information = data processed into meaning
Knowledge = understanding how to apply information to achieve goals
Example: We used Excel charts to generate information, but data science connects this to
business goals
Data Science Process (5 Steps)
1. Formulate a question that you need an answer for
The process begins with identifying a business problem or question
Examples:
– “How can we improve customer satisfaction?”
– “How can we increase production plant performance?”
2. Collect data
Gather data needed to answer the question
Types of data sources:
– Primary data: collected specifically for the problem (e.g., surveys, interviews)
– Secondary data: pre-existing data (e.g., reports, databases)
Data cleaning is essential:
– Fix missing values, remove duplicates or invalid entries, ensure reliability
3. Explore the collected data (EDA)
Exploratory Data Analysis (EDA) = analyzing and investigating data sets, often using
visualization
Helps understand what the data represents
Output = information (but not yet actionable knowledge)
4. Create a model using the collected data
Apply machine learning or statistical techniques
Models can predict or classify
– Build model → evaluate model performance → improve where necessary
5. Create a visual representation of the results
Communicate results to business stakeholders clearly
Use visuals to present findings for better understanding
Once understood, stakeholders can implement actions based on results