Cloudera Certified Associate (CCA)
Administrator Exam Verified Questions,
Correct Answers, and Detailed Explanations
for Computer Science Students||Already
Graded A+
1. What is the primary purpose of HDFS?
A. Real-time processing
B. Distributed storage and fault tolerance
C. Data visualization
D. Data modeling
Answer: B. Distributed storage and fault tolerance
HDFS (Hadoop Distributed File System) is designed to store large
datasets reliably across a cluster and provides fault tolerance through
data replication.
2. Which command is used to start all Hadoop daemons on a
Cloudera cluster?
A. start-all.sh
B. hadoop start
C. cloudera-scm start
D. hdfs start
Answer: A. start-all.sh
The start-all.sh script initializes HDFS (namenode, datanodes) and
YARN (resourcemanager, nodemanagers) daemons.
3. How does Hadoop ensure data reliability in HDFS?
,A. Checkpointing
B. Data replication across nodes
C. Periodic backups to NAS
D. RAID configuration
Answer: B. Data replication across nodes
HDFS replicates data blocks (default replication factor is 3) across
different nodes to ensure fault tolerance.
4. Which port does the HDFS NameNode default to?
A. 50010
B. 50070
C. 8020
D. 8088
Answer: B. 50070
Port 50070 is used by the NameNode web UI to monitor HDFS status
and manage file system metadata.
5. What is the function of a DataNode in HDFS?
A. Store metadata
B. Store actual data blocks
C. Manage job scheduling
D. Monitor cluster health
Answer: B. Store actual data blocks
DataNodes are responsible for storing the physical blocks of HDFS
files and serving read/write requests from clients.
6. Which Hadoop component handles resource management for
cluster applications?
,A. HDFS
B. MapReduce
C. YARN
D. Hive
Answer: C. YARN
YARN (Yet Another Resource Negotiator) manages cluster resources
and schedules jobs on nodes efficiently.
7. What file contains Hadoop configuration parameters like
replication factor and memory settings?
A. hadoop-env.sh
B. core-site.xml
C. hdfs-site.xml
D. yarn-site.xml
Answer: B. core-site.xml
core-site.xml contains core Hadoop properties such as default
filesystem URI and general configurations.
8. What is the purpose of hdfs fsck /?
A. List HDFS directories
B. Check HDFS health and identify corrupt files
C. Start NameNode
D. Delete HDFS files
Answer: B. Check HDFS health and identify corrupt files
The fsck command inspects HDFS and reports under-replicated or
corrupt blocks.
9. Which command displays HDFS disk usage by directory?
, A. hdfs dfs -ls
B. hdfs dfs -du
C. hdfs dfsadmin -report
D. hdfs dfs -count
Answer: B. hdfs dfs -du
The -du option summarizes disk usage of files and directories in HDFS.
10. How do you increase replication for an existing file in HDFS?
A. hdfs dfs -setrep
B. hdfs dfs -chmod
C. hdfs dfs -copyToLocal
D. hdfs dfs -mkdir
Answer: A. hdfs dfs -setrep
-setrep allows administrators to modify the replication factor of a file
after it has been created.
11. What is the default replication factor in HDFS?
A. 1
B. 2
C. 3
D. 4
Answer: C. 3
HDFS defaults to 3 copies of each block across different nodes for
fault tolerance.
12. Which command is used to view the HDFS NameNode web UI
from the terminal?
Administrator Exam Verified Questions,
Correct Answers, and Detailed Explanations
for Computer Science Students||Already
Graded A+
1. What is the primary purpose of HDFS?
A. Real-time processing
B. Distributed storage and fault tolerance
C. Data visualization
D. Data modeling
Answer: B. Distributed storage and fault tolerance
HDFS (Hadoop Distributed File System) is designed to store large
datasets reliably across a cluster and provides fault tolerance through
data replication.
2. Which command is used to start all Hadoop daemons on a
Cloudera cluster?
A. start-all.sh
B. hadoop start
C. cloudera-scm start
D. hdfs start
Answer: A. start-all.sh
The start-all.sh script initializes HDFS (namenode, datanodes) and
YARN (resourcemanager, nodemanagers) daemons.
3. How does Hadoop ensure data reliability in HDFS?
,A. Checkpointing
B. Data replication across nodes
C. Periodic backups to NAS
D. RAID configuration
Answer: B. Data replication across nodes
HDFS replicates data blocks (default replication factor is 3) across
different nodes to ensure fault tolerance.
4. Which port does the HDFS NameNode default to?
A. 50010
B. 50070
C. 8020
D. 8088
Answer: B. 50070
Port 50070 is used by the NameNode web UI to monitor HDFS status
and manage file system metadata.
5. What is the function of a DataNode in HDFS?
A. Store metadata
B. Store actual data blocks
C. Manage job scheduling
D. Monitor cluster health
Answer: B. Store actual data blocks
DataNodes are responsible for storing the physical blocks of HDFS
files and serving read/write requests from clients.
6. Which Hadoop component handles resource management for
cluster applications?
,A. HDFS
B. MapReduce
C. YARN
D. Hive
Answer: C. YARN
YARN (Yet Another Resource Negotiator) manages cluster resources
and schedules jobs on nodes efficiently.
7. What file contains Hadoop configuration parameters like
replication factor and memory settings?
A. hadoop-env.sh
B. core-site.xml
C. hdfs-site.xml
D. yarn-site.xml
Answer: B. core-site.xml
core-site.xml contains core Hadoop properties such as default
filesystem URI and general configurations.
8. What is the purpose of hdfs fsck /?
A. List HDFS directories
B. Check HDFS health and identify corrupt files
C. Start NameNode
D. Delete HDFS files
Answer: B. Check HDFS health and identify corrupt files
The fsck command inspects HDFS and reports under-replicated or
corrupt blocks.
9. Which command displays HDFS disk usage by directory?
, A. hdfs dfs -ls
B. hdfs dfs -du
C. hdfs dfsadmin -report
D. hdfs dfs -count
Answer: B. hdfs dfs -du
The -du option summarizes disk usage of files and directories in HDFS.
10. How do you increase replication for an existing file in HDFS?
A. hdfs dfs -setrep
B. hdfs dfs -chmod
C. hdfs dfs -copyToLocal
D. hdfs dfs -mkdir
Answer: A. hdfs dfs -setrep
-setrep allows administrators to modify the replication factor of a file
after it has been created.
11. What is the default replication factor in HDFS?
A. 1
B. 2
C. 3
D. 4
Answer: C. 3
HDFS defaults to 3 copies of each block across different nodes for
fault tolerance.
12. Which command is used to view the HDFS NameNode web UI
from the terminal?