Exam (elaborations)

Introduction to data science

Rating

Sold

Pages

Grade

A+

Uploaded on

13-12-2024

Written in

2024/2025

Here is the pdf of introduction to data science which covers : Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining research goals – Retrieving data – Data preparation - Exploratory Data analysis – build the model– presenting findings and building applications - Data Mining - Data Warehousing – Basic Statistical descriptions of Data

Show more Read less

Institution

Introduction To Data Science

Course

Introduction to data science

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Introduction to data science
Course: Introduction to data science

Document information

Uploaded on: December 13, 2024
Number of pages: 32
Written in: 2024/2025
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

Content preview

CS3352 – Foundations of Data Science

UNIT I INTRODUCTION

Data Science: Benefits and uses – facets of data - Data Science Process: Overview – Defining
research goals – Retrieving data – Data preparation - Exploratory Data analysis – build the
model– presenting findings and building applications - Data Mining - Data Warehousing –
Basic Statistical descriptions of Data

Big Data:

Big data is a blanket term for any collection of data sets so large or complex that it becomes
difficult to process them using traditional data management techniques such as for example,
the RDBMS.

I. Data Science:

• Data science involves using methods to analyze massive amounts of data and extract
the knowledge it contains.
• The characteristics of big data are often referred to as the three Vs:
o Volume—How much data is there?
o Variety—How diverse are different types of data?
o Velocity—At what speed is new data generated?
• Fourth V:
• Veracity: How accurate is the data?
• Data science is an evolutionary extension of statistics capable of dealing with the
massive amounts of data produced today.
• Data scientist apart from a statistician are the ability to work with big data and
experience in machine learning, computing, and algorithm building. Tools Hadoop,
Pig, Spark, R, Python, and Java, among others.

II. Benefits and uses of data science and big data

• Data science and big data are used almost everywhere in both commercial and non-
commercial settings.
• Commercial companies in almost every industry use data science and big data to
gain insights into their customers, processes, staff, completion, and products.
• Many companies use data science to offer customers a better user experience.
o Eg: Google AdSense, which collects data from internet users so relevant
commercial messages can be matched to the person browsing the internet
o MaxPoint - example of real-time personalized advertising.
• Human resource professionals:
o people analytics and text mining to screen candidates,
o monitor the mood of employees, and
o study informal networks among coworkers
• Financial institutions use data science:
o to predict stock markets, determine the risk of lending money, and
o learn how to attract new clients for their services
• Governmental organizations:
o internal data scientists to discover valuable information,
o share their data with the public

, o Eg: Data.gov is but one example; it’s the home of the US Government’s open
data.
o organizations collected 5 billion data records from widespread applications
such as Google Maps, Angry Birds, email, and text messages, among many
other data sources.
• Nongovernmental organizations:
o World Wildlife Fund (WWF), for instance, employs data scientists to increase
the effectiveness of their fundraising efforts.
o Eg: DataKind is one such data scientist group that devotes its time to the
benefit of mankind.
• Universities:
o Use data science in their research but also to enhance the study experience of
their students.
o massive open online courses (MOOC) produces a lot of data, which allows
universities to study how this type of learning can complement traditional
classes.
o Eg: Coursera, Udacity, and edX

III. Facets of data:
The main categories of data are these:
■ Structured
■ Unstructured
■ Natural language
■ Machine-generated
■ Graph-based
■ Audio, video, and images
■ Streaming

Structured data:
• Structured data is data that depends on a data model and resides in a fixed field
• within a record.
• Easy to store structured data in tables within databases or Excel files or Structured
Query Language.

Unstructured data:

• Unstructured data is data that isn’t easy to fit into a data model
• The content is context-specific or varying.
• Eg: E-mail
• Email contains structured elements such as the sender, title, and body text

, • Eg: It’s a challenge to find the number of people who have written an email
complaint about a specific employee because so many ways exist to refer to a
person.
• The thousands of different languages and dialects.

Natural language:
• A human-written email is also a perfect example of natural language data.
• Natural language is a special type of unstructured data;
• It’s challenging to process because it requires knowledge of specific data science
techniques and linguistics.
• Topics in NLP: entity recognition, topic recognition, summarization, text
completion, and sentiment analysis.
• Human language is ambiguous in nature.

Machine-generated data:
• Machine-generated data is information that’s automatically created by a computer,
process, application, or other machines without human intervention.
• Machine-generated data is becoming a major data resource.
• Eg: Wikibon has forecast that the market value of the industrial Internet will be
approximately $540 billion in 2020.
• International Data Corporation has estimated there will be 26 times more
connected things than people in 2020.
• This network is commonly referred to as the internet of things.
• Examples of machine data are web server logs, call detail records, network event
logs, and telemetry.

, Graph-based or network data:
• “Graph” in this case points to mathematical graph theory. In graph theory, a graph
is a
• mathematical structure to model pair-wise relationships between objects.
• Graph or network data is, in short, data that focuses on the relationship or
adjacency of objects.
• The graph structures use nodes, edges, and properties to represent and store
graphical
• data.
• Graph-based data is a natural way to represent social networks, and its structure
allows you to calculate the shortest path between two people.
• Graph-based data can be found on many social media websites.
• Eg: LinkedIn, Twitter, movie interests on Netflix
• Graph databases are used to store graph-based data and are queried with
specialized
• query languages such as SPARQL.

Audio, image, and video:
• Audio, image, and video are data types that pose specific challenges to a data
scientist.
• Recognizing objects in pictures, turn out to be challenging for computers.
• Major League Baseball Advanced Media - video capture to approximately 7 TB per
• game for the purpose of live, in-game analytics.
• High-speed cameras at stadiums will capture ball and athlete movements to
calculate in real time.
• DeepMind succeeded at creating an algorithm that’s capable of learning how to
play video games.
• This algorithm takes the video screen as input and learns to interpret everything
via a complex process of deep learning.
• Google – Artificial Intelligence Development plans

Streaming data:
• The data flows into the system when an event happens instead of being loaded into
a data store in a batch.
• Examples are the “What’s trending” on Twitter, live sporting or music events, and
• the stock market.

The data science process:

• The data science process typically consists of six steps:
o Setting the research goal
o Retrieving data
o Data preparation

$6.99

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

nitheesh10a2020

Get to know the seller

nitheesh10a2020 Srm university

View profile

Sold

Member since

1 year

Number of followers

Documents

Last sold

0.0

0 reviews

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller nitheesh10a2020. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $6.99. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 45158 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Introduction to data science

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?