• Introduction to Apache Hadoop and the Hadoop infrastructure .
Big data processing necessitates a new strategy: Requirements
Support for partial failure
The failure of one component shouldn't mean the system as a whole fails.
Recoverability of data
Another functional unit should take over the workload of a malfunctioning
component.
Recovery of components
Without the need for a complete system restart, a recovered component should
rejoin the system.
Consistency
Failures of components during the execution of the job shouldn't have an impact on
the result.
Scalability
If the system is overloaded, performance of individual jobs should gradually
deteriorate rather than the overall failing.
A corresponding increase in load capacity should be supported by increased
resources.
Performance of individual tasks, not a system failure, is one of the core Hadoop
properties.
A corresponding increase in load capacity should be supported by increased
resources.
Applications are created using high-level programming languages.
In a cluster of common machines, work is done. The least amount of
communication is possible among nodes.
Data is made available in advance. Connect the data to the computation.
For greater availability and reliability, data is replicated.
Hadoop is completely fault-tolerant and scalable.
Big data processing necessitates a new strategy: Requirements
Support for partial failure
The failure of one component shouldn't mean the system as a whole fails.
Recoverability of data
Another functional unit should take over the workload of a malfunctioning
component.
Recovery of components
Without the need for a complete system restart, a recovered component should
rejoin the system.
Consistency
Failures of components during the execution of the job shouldn't have an impact on
the result.
Scalability
If the system is overloaded, performance of individual jobs should gradually
deteriorate rather than the overall failing.
A corresponding increase in load capacity should be supported by increased
resources.
Performance of individual tasks, not a system failure, is one of the core Hadoop
properties.
A corresponding increase in load capacity should be supported by increased
resources.
Applications are created using high-level programming languages.
In a cluster of common machines, work is done. The least amount of
communication is possible among nodes.
Data is made available in advance. Connect the data to the computation.
For greater availability and reliability, data is replicated.
Hadoop is completely fault-tolerant and scalable.