1) Data Sciences - is a concept to unify statistics, data analysis, machine learning and their related methods in
order to understand and analyse actual phenomena with data. It employs techniques and theories drawn
from many fields within the context of Mathematics, Statistics, Computer Science, and Information Science.
2) Applications of Data Sciences
a) Fraud and Risk Detection
b) Genetics & Genomics
c) Internet Search
d) Targeted Advertising
e) Website Recommendations
f) Airline Route Planning
3) Sources of Data
4) Points to keep in mind while accessing data from any of the data sources:
a) Data which is available for public usage only should be taken up.
b) Personal datasets should only be used with the consent of the owner.
c) One should never breach someone’s privacy to collect data.
d) Data should only be taken form reliable sources as the data collected from random sources can be wrong
or unusable.
e) Reliable sources of data ensure the authenticity of data which helps in proper training of the AI model.
5) Types of Data
Usually, the data is collected in the form of tables. These tabular datasets can be stored in different
formats.
i) CSV: stands for comma separated values. It is a simple file format used to store tabular data. Each
line of this file is a data record and reach record consists of one or more fields which are separated by
commas.
ii) Spreadsheet: is used for accounting and recording data using rows and columns into which
information can be entered. Eg: Microsoft Excel
iii) SQL: stands for Structured Query Language. It is a domain-specific language used in programming
and is designed for managing data held in different kinds of DBMS (Database Management System)
It is useful in handling structured data.
6) Data Access - After collecting the data, to be able to use it for programming purposes, we should know
how to access the same in a Python code. There are various Python packages which help us in
accessing structured data (in tabular form) inside the code. Advantage of using Python packages is that
we do not need to make our own formula or equation to find out the results.