GRADED A+
what two problems arise from data storage growth? - -handling explosive growth with
constrained budgets
-exploiting all that data
SSD - solid state drive
-faster than magnetic hard drive
-latency- faster
-less power/heat
automatic data tiering - match storage performance to access frequency
-Current working data --> recently used --> historical
- Top tier--> mid tier --> bottom tier
-SSDs --> harddrives --> tape
DeDupe - data depulication
-same unstructured data repeatedly stored on system
-multiple instances are "pointer" to the single copy
-pointers are small in size and take up almost no storage
data silos - data collections are separate with no sharing
-obsolete legacy systems
-incompatible systems and data
- means missed opportunities to see patterns/trends
How do inconsistent data formats impact a business? - garbage in- garbage out
transaction procession system - repsond rapidly
respond consistently
How does the analysis of operational data compete with customers? - puts additional
load on systems when looking for trends and slows down the system
what can a company do about operational data? - -separate data repositories
Operational
reporting and analytics- combine data from many sources, historical data, periodic
import from operational systems
data warehouse and characteristics - collection of databases that supports decision
making
-sources: operational systems and historical data
- fast queries
- exploration
, how is data mart different from data warehouse? - data mart= specific problem, specific
unit
Big Data (three V's) - Volume- data is too big
Velocity- data is too fast
-rapid arrival and feedback loop
Variety- variety is too great
Hadoop - open source system to consume any data
-distributing computing platform
Hadoop advantages - scalable
cost-effective
flexible
fault-tolerant
Hadoop- mapreduce - map- process input data in parallel
reduce- combine data from map to create final results
hadoop- HDFS - hadoop distributed file system
-blocks distributed across servers
-replicate data, failure assumed
Hadoop- pig - programming environment
-pig latin
Examples of big data - -predictive policing- used crime and earthquake data
- tesco grocery chain- refrigerators have data which lowers energy
- actions speak louder than words- improve therapy diagnoses- virtual therapist able to
interact with patients
canned reports Pros - -Predefined format
-answer specific questions
-easy for users
canned reports cons - -inflexible
-IT overhead
ad-hoc - users define their own reports
Pro- powerful/flexible
con- demanding of user, steep learning curve, business knowledge
dashboards - graphic view of whats happening inside
-some customization
OLAP - online analytical processing