Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Summary

Summary Data science

Rating
-
Sold
-
Pages
17
Uploaded on
28-11-2024
Written in
2024/2025

Summary of 17 pages for the course AD3491 at Anna University (Data science)

Institution
Course

Content preview

INTRODUCTION TO DATA SCIENCE
UNIT I - INTRODUCTION
Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining research goals –
Retrieving data – Data preparation - Exploratory Data analysis – Build the model– presenting findings and
building applications.
Data
In computing, data is information that has been translated into a form that is efficient for movement or
processing

Data Science
Data Science involves applying scientific methods, statistical techniques, computational tools, and domain
expertise to explore, analyze, and extract insights from data. The term emphasizes the rigorous and systematic
approach taken to understand and derive value from vast and complex datasets.
Data science is an evolutionary extension of statistics capable of dealing with the massive amounts
of dataproduced today. It adds methods from computer science to the repertoire of statistics.

Benefits and uses of data science
Data science and big data are used almost everywhere in both commercial and noncommercial Settings
 Commercial companies in almost every industry use data science and big data to gain insights into their
customers, processes, staff, completion, and products.
 Many companies use data science to offer customers a better user experience, as well as to cross-sell,
up-sell, and personalize their offerings.
 Governmental organizations are also aware of data’s value. Many governmental organizations not only
rely on internal data scientists to discover valuable information, but also share their data with the public.
 Nongovernmental organizations (NGOs) use it to raise money and defend their causes.
 Universities use data science in their research but also to enhance the study experience of their students.
The rise of massive open online courses (MOOC) produces a lot of data, which allows universities to
study how this type of learning can complement traditional classes.

Facets of data
In data science and big data you’ll come across many different types of data, and each of them tends to require
different tools and techniques. The main categories of data are these:
 Structured
 Unstructured
 Natural language
 Machine-generated
 Graph-based
 Audio, video, and images
 Streaming
Let’s explore all these interesting data types.

Structured data
 Structured data is data that depends on a data model and resides in a fixed field within a record. As such,
it’s often easy to store structured data in tables within databases or Excel files
 SQL, or Structured Query Language, is the preferred way to manage and query data that resides in
databases.


1

,Unstructured data
Unstructured data is data that isn’t easy to fit into a data model because the content is context-specific or
varying. One example of unstructured data is your regular email




Natural language
 Natural language is a special type of unstructured data; it’s challenging to process because it requires
knowledge of specific data science techniques and linguistics.



2

,  The natural language processing community has had success in entity recognition, topic recognition,
summarization, text completion, and sentiment analysis, but models trained in one domain don’t
generalize well to other domains.
 Even state-of-the-art techniques aren’t able to decipher the meaning of every piece of text.

Machine-generated data
 Machine-generated data is information that’s automatically created by a computer, process,
application, or other machine without human intervention.
 Machine-generated data is becoming a major data resource and will continue to do so.
 The analysis of machine data relies on highly scalable tools, due to its high volume and speed.
Examples of machine data are web server logs, call detail records, network event logs, and telemetry.




Graph-based or network data
 “Graph data” can be a confusing term because any data can be shown in a graph.
 Graph or network data is, in short, data that focuses on the relationship or adjacency of objects.
 The graph structures use nodes, edges, and properties to represent and store graphical data.
 Graph-based data is a natural way to represent social networks, and its structure allows you to
calculate specific metrics such as the influence of a person and the shortest path between two people.




Audio, image, and video
 Audio, image, and video are data types that pose specific challenges to a data scientist.
 Tasks that are trivial for humans, such as recognizing objects in pictures, turn out to be challenging for
computers.
 MLBAM (Major League Baseball Advanced Media) announced in 2014 that they’ll increase video
capture to approximately 7 TB per game for the purpose of live, in-game analytics.
 Recently a company called Deep Mind succeeded at creating an algorithm that’s capable of
learninghow to play video games.
 This algorithm takes the video screen as input and learns to interpret everything via a complex
Pro cess of deep learning.
3

Written for

Institution
Course

Document information

Uploaded on
November 28, 2024
Number of pages
17
Written in
2024/2025
Type
SUMMARY

Subjects

$4.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
venukumarpjxconfident

Get to know the seller

Seller avatar
venukumarpjxconfident SAEC
Follow You need to be logged in order to follow users or courses
Sold
-
Member since
1 year
Number of followers
0
Documents
1
Last sold
-

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Working on your references?

Create accurate citations in APA, MLA and Harvard with our free citation generator.

Working on your references?

Frequently asked questions