Big Data Foundations
4.0 Credits
Objective Assessment Review (Qns &
Ans)
2025
©2025
, Multiple Choice Questions
Question 1:
An organization receives a continuous, high‑velocity data stream from
IoT devices and requires near‑real‑time analytics. Which characteristic of
big data does this scenario primarily demonstrate?
- A. Volume
- B. Velocity
- C. Variety
- D. Veracity
Correct ANS: B. Velocity
Rationale:
Velocity refers to the speed at which data is generated and must be
processed. In scenarios involving IoT devices, rapid ingestion and real-
time processing of data are critical, making velocity the key
characteristic.
---
Question 2:
A company is evaluating a solution to process both historical and
streaming data for its analytics needs. They decide to use Apache Kafka
for ingestion and Apache Spark for processing. What primary advantage
does this combination provide over a traditional batch processing
framework like MapReduce?
- A. Improved disk I/O throughput
- B. Lower latency and real‑time processing capability
- C. Stricter schema enforcement
- D. Simplified data modeling
Correct ANS: B. Lower latency and real‑time processing capability
Rationale:
©2025
, Using Apache Kafka with Apache Spark allows for real‑time streaming
data ingestion and in‑memory processing, drastically reducing latency
compared to traditional batch processing methods like MapReduce.
---
Question 3:
In a distributed file system such as HDFS, fault tolerance is achieved
primarily through which method?
- A. Data partitioning
- B. Data replication
- C. Data sharding
- D. Data caching
Correct ANS: B. Data replication
Rationale:
HDFS ensures fault tolerance by replicating each data block (typically
across several nodes). This replication strategy allows the system to
continue operating even if some nodes fail, thereby maintaining data
availability.
---
Question 4:
When executing a MapReduce job, one challenge is data skew, where
certain reducers receive disproportionately large amounts of data. Which
strategy can best help mitigate this issue?
- A. Custom data partitioning strategies
- B. Increasing the number of mappers
- C. Using an external database for intermediate storage
- D. Enforcing strict schema-on-read
Correct ANS: A. Custom data partitioning strategies
Rationale:
Data skew is addressed by devising custom partitioning logic (and, if
©2025