PM Science
Introduction to data science
Module I
What Is Data Science?
Data science is the domain of study that deals with vast volumes of data
using modern tools and techniques to find unseen patterns, derive meaningful
information, and make business decisions. Data science uses complex
machine learning algorithms to build predictive models.
The data used for analysis can come from many different sources and presented
in various formats.
Now that you know what data science is, let’s see why data science is essential
to today’s IT landscape.
The Data Science Lifecycle
Data science’s lifecycle consists of five distinct stages, each with its own tasks:
1. Capture: Data Acquisition, Data Entry, Signal Reception, Data Extraction.
This stage involves gathering raw structured and unstructured data.
2. Maintain: Data Warehousing, Data Cleansing, Data Staging, Data Processing,
Data Architecture. This stage covers taking the raw data and putting it in a
form that can be used.
3. Process: Data Mining, Clustering/Classification, Data Modeling, Data
Summarization. Data scientists take the prepared data and examine its
patterns, ranges, and biases to determine how useful it will be in predictive
analysis.
about:bl 1/
,9/18/24, 2:08 Introduction TO DATA
PM Science
4. Analyze: Exploratory/Confirmatory, Predictive Analysis, Regression, Text
Mining, Qualitative Analysis. Here is the real meat of the lifecycle. This
stage involves performing the various analyses on the data.
5. Communicate: Data Reporting, Data Visualization, Business
Intelligence, Decision Making. In this final step, analysts prepare the
analyses in easily readable forms such as charts, graphs, and reports.
Prerequisites for Data Science
Here are some of the technical concepts you should know about before starting
to learn what is data science.
1. Machine Learning
Machine learning is the backbone of data science. Data Scientists need to have
a solid grasp of ML in addition to basic knowledge of statistics.
2. Modeling
Mathematical models enable you to make quick calculations and predictions
based on what you already know about the data. Modeling is also a part of
Machine Learning and involves identifying which algorithm is the most suitable
to solve a given problem and how to train these models.
3. Statistics
Statistics are at the core of data science. A sturdy handle on statistics can help
you extract more intelligence and obtain more meaningful results.
4. Programming
Some level of programming is required to execute a successful data science
project. The most common programming languages are Python, and R. Python
is especially popular because it’s easy to learn, and it supports multiple libraries
for data science and ML.
5. Databases
A capable data scientist needs to understand how databases work, how to
manage them, and how to extract data from them.
about:bl 2/
,9/18/24, 2:08 Introduction TO DATA
PM Science
about:bl 3/
, 9/18/24, 2:08 Introduction TO DATA
PM Science
So, if you’re looking for an exciting career that offers stability and
generous compensation, then look no further!
Where Do You Fit in Data Science?
Data science offers you the opportunity to focus on and specialize in one aspect
of the field. Here’s a sample of different ways you can fit into this exciting, fast-
growing field.
Data Scientist
• Job role: Determine what the problem is, what questions need answers,
and where to find the data. Also, they mine, clean, and present the relevant
data.
• Skills needed: Programming skills (SAS, R, Python), storytelling and data
visualization, statistical and mathematical skills, knowledge of Hadoop,
SQL, and Machine Learning.
Data Analyst
• Job role: Analysts bridge the gap between the data scientists and the business
analysts, organizing and analyzing data to answer the questions the
organization poses. They take the technical analyses and turn them into
qualitative action items.
• Skills needed: Statistical and mathematical skills, programming skills (SAS,
R, Python), plus experience in data wrangling and data visualization.
Data Engineer
• Job role: Data engineers focus on developing, deploying, managing, and
optimizing the organization’s data infrastructure and data pipelines.
Engineers support data scientists by helping to transfer and transform data
for queries.
• Skills needed: NoSQL databases (e.g., MongoDB, Cassandra DB),
programming languages such as Java and Scala, and frameworks (Apache
Hadoop).
Data Science Tools
The data science profession is challenging, but fortunately, there are plenty of
tools available to help the data scientist succeed at their job.
about:bl 4/