Big Data & Hadoop Notes
What is Hadoop? Advantages
Hadoop is an open-source framework for distributed storage and processing of big data. It includes:
- HDFS (storage)
- MapReduce (processing)
- YARN (resource management)
- Common libraries
Advantages:
- Scalable
- Cost-effective
- Fault-tolerant
- High throughput
- Flexible data handling
- Open-source
- Data locality for performance.
What is Big Data? 5 V's
Big Data refers to large, complex datasets that are hard to handle using traditional methods.
5 Vs:
- Volume: Large amounts of data
- Velocity: Fast data generation
- Variety: Different formats
- Veracity: Accuracy and reliability
- Value: Usefulness of data insights
Hadoop vs RDBMS
Hadoop vs Traditional RDBMS:
- Data Type: All types vs Structured only
What is Hadoop? Advantages
Hadoop is an open-source framework for distributed storage and processing of big data. It includes:
- HDFS (storage)
- MapReduce (processing)
- YARN (resource management)
- Common libraries
Advantages:
- Scalable
- Cost-effective
- Fault-tolerant
- High throughput
- Flexible data handling
- Open-source
- Data locality for performance.
What is Big Data? 5 V's
Big Data refers to large, complex datasets that are hard to handle using traditional methods.
5 Vs:
- Volume: Large amounts of data
- Velocity: Fast data generation
- Variety: Different formats
- Veracity: Accuracy and reliability
- Value: Usefulness of data insights
Hadoop vs RDBMS
Hadoop vs Traditional RDBMS:
- Data Type: All types vs Structured only