100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Tentamen (uitwerkingen)

Cloudera Certified Professional (CCP) – Data Engineer level Verified Questions, Correct Answers, and Detailed Explanations for Computer Science Students||Already Graded A+

Beoordeling
-
Verkocht
-
Pagina's
27
Cijfer
A+
Geüpload op
16-12-2025
Geschreven in
2025/2026

Cloudera Certified Professional (CCP) – Data Engineer level Verified Questions, Correct Answers, and Detailed Explanations for Computer Science Students||Already Graded A+

Instelling
Computer Science
Vak
Computer science










Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Geschreven voor

Instelling
Computer science
Vak
Computer science

Documentinformatie

Geüpload op
16 december 2025
Aantal pagina's
27
Geschreven in
2025/2026
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

Voorbeeld van de inhoud

Cloudera Certified Professional (CCP) – Data
Engineer level Verified Questions, Correct
Answers, and Detailed Explanations for Computer
Science Students||Already Graded A+
1.
Which of the following file formats supports columnar storage and is optimized
for analytics workloads in Hadoop/Spark environments?
A. JSON
B. CSV
C. Parquet
D. Plain text
Parquet is a columnar storage format designed for efficient analytic queries,
compression, and schema evolution — making it ideal for big data workloads.
2.
In Apache Spark, which RDD action returns the first element of the dataset?
A. collect()
B. take(1)
C. first()
D. head()
The first() action returns the first element in the RDD; take(1) returns an array
of one element.
3.
Which concept ensures that a transaction in HDFS (or a Hive/Impala table) is
atomic and either fully committed or not at all?
A. Replication factor
B. ACID support via transactional tables (ORC/Parquet with Hive
transactional tables)
C. Block size
D. Compression
Using Hive transactional tables with ORC/Parquet and enabling ACID ensures
atomicity and consistency for insert/update/delete.

,4.
Which of these settings in Spark will allow a job to use only 2 cores on a
cluster?
A. spark.executor.memory
B. spark.executor.cores=2
C. spark.driver.memory
D. spark.memory.fraction
spark.executor.cores controls the number of CPU cores each executor can use;
setting it to 2 restricts cores per executor.
5.
In Apache Kafka, what is the purpose of a “topic”?
A. To manage user authentication
B. To define the cluster’s configuration
C. To categorize and route streams of messages
D. To store messages on disk
A topic is a named stream or category to which messages are published and
from which consumers read.
6.
Which command-line tool is used to submit Spark jobs to a cluster?
A. spark-shell
B. pyspark
C. spark-submit
D. spark-cli
The spark-submit script packages the application and submits it to the cluster
in cluster or client mode.
7.
If you want to store event data that arrives in real-time and then run batch
analytics on it later, which design pattern best fits?
A. Lambda architecture
B. Kappa architecture
C. Monolithic application
D. Master–slave replication

, The Kappa architecture processes data in real time (e.g., via Kafka + Spark
Streaming) and avoids a separate batch layer, enabling simplified streams +
batch semantics.
8.
Which of the following is NOT a benefit of using partitioning in a Hive table?
A. Improved query performance on partitioned columns
B. Faster metadata operations for partitioned data
C. Automatic compression of data
D. Easier data management by date or category
Partitioning helps with query pruning and management but does not inherently
compress data.
9.
In Spark SQL, which clause allows you to remove duplicate rows from a result
set?
A. DISTINCT
B. GROUP BY
C. UNION
D. ORDER BY
DISTINCT removes duplicates. GROUP BY clusters rows by grouping columns but
still requires an aggregate.
10.
Which of the following best describes “wide dependency” in Spark?
A. One parent partition maps to one child partition
B. One parent partition maps to many child partitions
C. No shuffle is required between stages
D. Narrow dependency with pipeline
Wide dependencies require shuffles: data from one partition gets redistributed
across many child partitions — this can be expensive.
11.
Which command in Hive enables you to see the structure (schema) of a Hive
table?
A. SHOW TABLES
B. DESCRIBE DATABASE
€19,24
Krijg toegang tot het volledige document:

100% tevredenheidsgarantie
Direct beschikbaar na je betaling
Lees online óf als PDF
Geen vaste maandelijkse kosten

Maak kennis met de verkoper
Seller avatar
lewizranking

Maak kennis met de verkoper

Seller avatar
lewizranking Teachme2-tutor
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
Nieuw op Stuvia
Lid sinds
1 maand
Aantal volgers
0
Documenten
453
Laatst verkocht
-

0,0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen