• What is Data Science?
➢ Data science is the process of extracting insights from data
using technology. It involves using advanced tools and
programming languages such as Python and R to analyze
big data, which is not feasible with traditional tools like
Microsoft Excel. A typical data science project involves a
business problem, data collection, data cleaning and
exploration, building a model, and deploying the model to
production.
• Example: Restaurant Business
➢ For instance, let's consider a restaurant business with
locations in New York and San Diego. The local managers
send monthly sales data in an Excel file to the business
owner. By plotting a side-by-side bar chart of the monthly
sales numbers, the business owner can identify that the New
York restaurant is not performing well during summer
months. This visualization can help the owner make a
business decision to run a special promotion during the
summer in the New York restaurant.
• Tools Used in Data Science
➢ For data storage and distributed computing, data scientists
use technologies like Apache Hadoop and Apache Spark. For
visualization, they use tools like Tableau, Power BI,
Matplotlib, and Jupiter Notebook. For deep learning, they
might use Tensorflow, PyTorch, or Scikit-learn.
• Real-Life Examples
➢ Data science is used heavily in various industries, including:
,➢ Amazon: product recommendations and video
recommendations
➢ Healthcare: analyzing data from wearable devices to predict
heart disease or respiratory problems
➢ Finance: fraud detection using advanced algorithms
➢ Shipping: finding the best route between two destinations
• Join the Conversation
➢ We want to hear your definition of data science! Post a
comment below with your creative definition, and we might
feature it in a future video. Also, check out our Calma
campaign and fill out the Google form for a chance to win a
phone call with us.
• Can You Learn Data Science as a Mechanical
Engineer or a Non-Technical Graduate?
➢ The answer is yes. Many successful data scientists and software
engineers come from non-technical backgrounds, and you can follow
in their footsteps. In this video, I will show you a step-by-step
approach to learning data science effectively.
• The Topics You Need to Learn for Data Science
➢ Data science is a process of drawing insights from data, and you
need a programming language to operate on this data. Python and R
are the two most popular programming languages used by data
scientists, and I prefer Python because it allows for full-stack
application development. To master Python, focus on the following
topics:
➢ Variables and numbers
➢ Basic data types such as strings, lists, and dictionaries
➢ Control flow structures such as if and for loops
, ➢ Functions
➢ Basic understanding of Python modules
➢ Reading and writing files
➢ Once you know these topics, move on to NumPy and Pandas, which
allow you to do data cleaning and data exploration, two essential
tasks for data scientists. Focus on the following topics:
➢ Data frame and series basics
➢ Creating a data frame from various files such as Excel, CSV, or SQL
table
➢ Handling any values in data frame and panel
➢ Merging and concatenating data frames
➢ Grouping data frames and doing different SQL-type operations on
them
➢ You also need a code editor, such as PyCharm or Visual Studio Code,
and Jupyter Notebook for interactive data visualization. For data
visualization, use Matplotlib or Seaborn, and know the basic chart
types, such as line, bar, and pie charts.
➢ As a data scientist, you need a good understanding of SQL and
relational databases, as you will be retrieving data from multiple
tables. You also need a good understanding of math and statistics,
linear algebra, and differential calculus.
➢ Finally, you will need to learn about machine learning and deep
learning, using libraries and frameworks such as scikit-learn,
TensorFlow, PyTorch, and Theano. You will also need to know how to
use Microsoft Excel for data analysis when your data size is small and
you don't want to write code.
• Enterprise-Level Tools
➢ While there are enterprise-level tools such as Tableau and Power BI
for data visualization, Hadoop and Spark for storing big data, and
Amazon SageMaker and Google Cloud Compute for cloud-based