100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Exam (elaborations)

Databricks-Certified-Professional-Data-Engineer Practice Exam Questions and Answers

Rating
-
Sold
-
Pages
11
Grade
A+
Uploaded on
26-05-2024
Written in
2023/2024

Question # 1 A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum. Which situation is causing increased duration of the overall job? Options: A.Task queueing resulting from improper thread pool assignment. B.Spill resulting from attached volume storage being too small. C.Network latency due to some cluster nodes being in different regions from the source data D.Skew caused by more data being assigned to a subset of spark-partitions. E.Credential validation errors while pulling data from an external system. Question # 2 The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalidlatitudeandlongitudevalues in theactivity_detailstable have been breaking their ability to use other geolocation processes. A junior engineer has written the following code to addCHECKconstraints to the Delta Lake table: A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed. Which statement explains the cause of this failure? Options: A.Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing. B.The activity details table already exists; CHECK constraints can only be added during initial table creation. C.The activity details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table. D.The activity details table already contains records; CHECK constraints can only be added prior to inserting values into a table. E.The current table schema does not contain the field valid coordinates; schema evolution will need to be enabled before altering the table to add a constraint. Question # 3 Which statement describes integration testing? Options: A.Validates interactions between subsystems of your application B.Requires an automated testing framework C.Requires manual intervention D.Validates an application use case E.Validates behavior of individual elements of your application Question # 4 Which statement describes Delta Lake Auto Compaction? Options: A. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 G B.Before a Jobs cluster terminates, optimize is executed on all tables modified during the most recent job. C.Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.CONTINUED......

Show more Read less
Institution
Course









Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Course

Document information

Uploaded on
May 26, 2024
Number of pages
11
Written in
2023/2024
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

Question # 1

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes
that the Min, Median, and Max Durations for tasks in a particular stage show the
minimum and median time to complete a task as roughly the same, but the max
duration for a task to be roughly 100 times as long as the minimum.

Which situation is causing increased duration of the overall job?

Options:

A.Task queueing resulting from improper thread pool assignment.

B.Spill resulting from attached volume storage being too small.

C.Network latency due to some cluster nodes being in different regions from the
source data

D.Skew caused by more data being assigned to a subset of spark-partitions.

E.Credential validation errors while pulling data from an external system.

Question # 2

The downstream consumers of a Delta Lake table have been complaining about data
quality issues impacting performance in their applications. Specifically, they have
complained that invalidlatitudeandlongitudevalues in theactivity_detailstable have
been breaking their ability to use other geolocation processes.

A junior engineer has written the following code to addCHECKconstraints to the
Delta Lake table:




A senior engineer has confirmed the above logic is correct and the valid ranges for
latitude and longitude are provided, but the code fails when executed.

, Which statement explains the cause of this failure?

Options:

A.Because another team uses this table to support a frequently running application,
two-phase locking is preventing the operation from committing.

B.The activity details table already exists; CHECK constraints can only be added
during initial table creation.

C.The activity details table already contains records that violate the constraints; all
existing data must pass CHECK constraints in order to add them to an existing table.

D.The activity details table already contains records; CHECK constraints can only be
added prior to inserting values into a table.

E.The current table schema does not contain the field valid coordinates; schema
evolution will need to be enabled before altering the table to add a constraint.

Question # 3

Which statement describes integration testing?

Options:

A.Validates interactions between subsystems of your application

B.Requires an automated testing framework

C.Requires manual intervention

D.Validates an application use case

E.Validates behavior of individual elements of your application

Question # 4

Which statement describes Delta Lake Auto Compaction?

Options:

A. An asynchronous job runs after the write completes to detect if files could be
further compacted; if yes, an optimize job is executed toward a default of 1 G

B.Before a Jobs cluster terminates, optimize is executed on all tables modified during
the most recent job.

C.Optimized writes use logical partitions instead of directory partitions; because
partition boundaries are only represented in metadata, fewer small files are written.
$10.99
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
munyuabeatrice92

Get to know the seller

Seller avatar
munyuabeatrice92 K
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
2 year
Number of followers
1
Documents
347
Last sold
1 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions