Apache Hadoop New Exam With Complete Solutions
100% Verified
What is the basic assumption? - ANSWER Hardware failures are a common
occurrence and should be automatically handled by the framework
Hadoop Core - ANSWER Storage - Hadoop Distributed File System
Processing - MapReduce
Hadoop splits files into. - ANSWER.large blocks and distributes them across nodes in a
cluster
The base Hadoop Framework - ANSWER Hadoop Common, HDFS, YARN, and
MapReduce
Hadoop Common: It contains the libraries and utilities that are used by other Hadoop
modules.
HDFS: It is a Distributed File System that stores data on commodity machines to provide
very high aggregate bandwidth across the cluster.
YARN: YARN, an abbreviation for Yet Another Resource Manager, is a resource
management platform that is used for managing computing resources in clusters and
utilizing them for scheduling and thus scheduling of users' applications.
MapReduce - ANSWER A programming model and an associated implementation for
processing and generating large data sets with a parallel, distributed algorithm on a
cluster
,Most of the Hadoop Framework was written in. - ANSWER Java
Hadoop Ecosystem - ANSWER Pig, Hive, HBase, Phoenix, Spark, Flume, Sqoop, Oozie,
Storm, and Zookeeper
Other Hadoop technologies - ANSWER Impala, Hue, and Cassandra
Pig - ANSWER A high-level platform for creating programs that run on Hadoop. Executes
Hadoop jobs as MapReduce, Tez and Spark
Hive - ANSWER A data warehouse infrastructure built on top of Hadoop for providing
data summarization, query and analysis. Uses MapReduce or YARN underneath and is
batch based, disk-based and fault tolerant.
HBase- ANSWER Non-relational scalable distributed database.
HBase tables can serve as the input for and output from MapReduce jobs run in Hadoop.
Used for real-time querying of Big Data.
A NoSQL database.
Intended for data lake use cases.
Data Lake - ANSWER Storage repository of raw data in its native format until it's
needed
Phoenix - ANSWER A MPP relational database engine supporting OLTP (Online
,Transaction Processing) for Hadoop using HBase as it's backing store
Unlike Impala, Phoenix can use HBase directly.
Spark - ANSWER A cluster computing framework.
Faster than MapReduce
Flume - ANSWER A distributed and reliable service for efficiently collecting,
aggregating, and moving large amounts of log data
Sqoop - ANSWER A command-line interface application that transfers data between
relational databases and Hadoop.
Oozie - ANSWER A server-based workflow scheduling system to manage Hadoop jobs.
Storm - ANSWER A distributed data stream processing computation framework.
Written mostly in the Clojure programming language.
Zookeeper - ANSWER A centralized service for maintaining Hadoop applications.
Cloudera Impala - ANSWER An MPP SQL query engine for data stored in a computer
cluster running Hadoop.
Does not use MapReduce or YARN
2. In-memory (faster)
3. Requires Hive to use HBase
4. Not fault tolerant
, Hue - ANSWER A web interface that supports Hadoop and it's ecosystem
Cassandra - ANSWER A distributed database management system.
A NoSQL database.
Can be used for always-on applications, like web and mobile, something HBase cannot.
Hadoop requires. - ANSWER.the Java Runtime Environment (JRE) and Secure Shell
(ssh)
A small Hadoop cluster includes. - ANSWER A single master and multiple worker
nodes
The master node consists of:
- Job Tracker
- Task Tracker
- NameNode
- DataNode
In a typical deployment, a slave or worker node is both a DataNode and a Task Tracker.
NameNode - SOLUTION The center of an HDFS file system. It keeps the directory tree
of all files in this file system, and maps where on the cluster the data files are kept.
DataNode - SOLUTION HDFS data is kept in a DataNode.
100% Verified
What is the basic assumption? - ANSWER Hardware failures are a common
occurrence and should be automatically handled by the framework
Hadoop Core - ANSWER Storage - Hadoop Distributed File System
Processing - MapReduce
Hadoop splits files into. - ANSWER.large blocks and distributes them across nodes in a
cluster
The base Hadoop Framework - ANSWER Hadoop Common, HDFS, YARN, and
MapReduce
Hadoop Common: It contains the libraries and utilities that are used by other Hadoop
modules.
HDFS: It is a Distributed File System that stores data on commodity machines to provide
very high aggregate bandwidth across the cluster.
YARN: YARN, an abbreviation for Yet Another Resource Manager, is a resource
management platform that is used for managing computing resources in clusters and
utilizing them for scheduling and thus scheduling of users' applications.
MapReduce - ANSWER A programming model and an associated implementation for
processing and generating large data sets with a parallel, distributed algorithm on a
cluster
,Most of the Hadoop Framework was written in. - ANSWER Java
Hadoop Ecosystem - ANSWER Pig, Hive, HBase, Phoenix, Spark, Flume, Sqoop, Oozie,
Storm, and Zookeeper
Other Hadoop technologies - ANSWER Impala, Hue, and Cassandra
Pig - ANSWER A high-level platform for creating programs that run on Hadoop. Executes
Hadoop jobs as MapReduce, Tez and Spark
Hive - ANSWER A data warehouse infrastructure built on top of Hadoop for providing
data summarization, query and analysis. Uses MapReduce or YARN underneath and is
batch based, disk-based and fault tolerant.
HBase- ANSWER Non-relational scalable distributed database.
HBase tables can serve as the input for and output from MapReduce jobs run in Hadoop.
Used for real-time querying of Big Data.
A NoSQL database.
Intended for data lake use cases.
Data Lake - ANSWER Storage repository of raw data in its native format until it's
needed
Phoenix - ANSWER A MPP relational database engine supporting OLTP (Online
,Transaction Processing) for Hadoop using HBase as it's backing store
Unlike Impala, Phoenix can use HBase directly.
Spark - ANSWER A cluster computing framework.
Faster than MapReduce
Flume - ANSWER A distributed and reliable service for efficiently collecting,
aggregating, and moving large amounts of log data
Sqoop - ANSWER A command-line interface application that transfers data between
relational databases and Hadoop.
Oozie - ANSWER A server-based workflow scheduling system to manage Hadoop jobs.
Storm - ANSWER A distributed data stream processing computation framework.
Written mostly in the Clojure programming language.
Zookeeper - ANSWER A centralized service for maintaining Hadoop applications.
Cloudera Impala - ANSWER An MPP SQL query engine for data stored in a computer
cluster running Hadoop.
Does not use MapReduce or YARN
2. In-memory (faster)
3. Requires Hive to use HBase
4. Not fault tolerant
, Hue - ANSWER A web interface that supports Hadoop and it's ecosystem
Cassandra - ANSWER A distributed database management system.
A NoSQL database.
Can be used for always-on applications, like web and mobile, something HBase cannot.
Hadoop requires. - ANSWER.the Java Runtime Environment (JRE) and Secure Shell
(ssh)
A small Hadoop cluster includes. - ANSWER A single master and multiple worker
nodes
The master node consists of:
- Job Tracker
- Task Tracker
- NameNode
- DataNode
In a typical deployment, a slave or worker node is both a DataNode and a Task Tracker.
NameNode - SOLUTION The center of an HDFS file system. It keeps the directory tree
of all files in this file system, and maps where on the cluster the data files are kept.
DataNode - SOLUTION HDFS data is kept in a DataNode.