Lesson #3 • Capture quickly after the event
• Available quickly enough
UNDERSTANDING BIG DATA AND DATA ANALYTICS • Available frequently enough
Business Analytics, BI, Big Data, Data Mining-What’s the RELEVANCE – Data relevant to intended purpose:
Difference?
• Periodic review of requirements
Business Analytics – tools to explore past data to
• Quality assurance and feedback process
gain insights into future business decisions.
• Use carefully for other purposes
BI – tools and techniques to turn data into
meaningful information. COMPLETENESS – Monitor quality to match data needs:
Big Data – data sets that are so large or complex
that traditional data processing applications are • Missing data
inadequate. • Invalid data
Data Mining – tools for discovering patterns in large • Incomplete data
data sets.
The Information Gap
Businesses Need Support for Decision-Making
The shortfall between gathering information and using
Uncertain economics. it for decision making.
Rapidly changing environments
➢ Firms have inadequate data warehouses.
Global competition
➢ Business Analysts spend 2 data a week gathering
Demanding customers
and formatting data, instead of performing analysis.
Taking advantage of information acquired by
(Data Warehousing Institute).
companies is a Critical Source Factor.
➢ Business Intelligence (BI) seeks to bridge the
information gap.
What is Big Data?
Massive sets of unstructured/semi-structured
data from web traffic, social media, sensors, etc.
Petabytes, exabytes of data
Characteristics of Data for Good Decision-Making Volumes too great for typical DBMS
Information from multiple internal and external
---Better Quality Data-Characteristics sources:
ACCURACY – Accurate enough for intended purposes: - Transactions
- Social Media
• Balance with use, cost, effort, timeliness - Enterprise Content
• Capture close to point of activity - Sensors
• Make accuracy compromises clear - Mobile Devices
In the last minute there were…….
VALIDITY – Compliance with requirements: - 204 million emails sent
• Application of definitions - 61,000 hours of music listened to on
• Consistency over time Pandora
- 20 million photo views
• Consistency with others
- 100,000 tweets
RELIABILITY – Collection processes consistent: - 6 million views & 277,000 Facebook Logins
- 2+ million Google searches
• …over time - 3 million uploads on Flickr
• …for multiple collection points
• …between collection systems