A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes
that the Min, Median, and Max Durations for tasks in a particular stage show the
minimum and median time to complete a task as roughly the same, but the max
duration for a task to be roughly 100 times as long as the minimum.
Which situation is causing increased duration of the overall job?
Options:
A.Task queueing resulting from improper thread pool assignment.
B.Spill resulting from attached volume storage being too small.
C.Network latency due to some cluster nodes being in different regions from the
source data
D.Skew caused by more data being assigned to a subset of spark-partitions.
E.Credential validation errors while pulling data from an external system.
Question # 2
The downstream consumers of a Delta Lake table have been complaining about data
quality issues impacting performance in their applications. Specifically, they have
complained that invalidlatitudeandlongitudevalues in theactivity_detailstable have
been breaking their ability to use other geolocation processes.
A junior engineer has written the following code to addCHECKconstraints to the
Delta Lake table:
A senior engineer has confirmed the above logic is correct and the valid ranges for
latitude and longitude are provided, but the code fails when executed.
, Which statement explains the cause of this failure?
Options:
A.Because another team uses this table to support a frequently running application,
two-phase locking is preventing the operation from committing.
B.The activity details table already exists; CHECK constraints can only be added
during initial table creation.
C.The activity details table already contains records that violate the constraints; all
existing data must pass CHECK constraints in order to add them to an existing table.
D.The activity details table already contains records; CHECK constraints can only be
added prior to inserting values into a table.
E.The current table schema does not contain the field valid coordinates; schema
evolution will need to be enabled before altering the table to add a constraint.
Question # 3
Which statement describes integration testing?
Options:
A.Validates interactions between subsystems of your application
B.Requires an automated testing framework
C.Requires manual intervention
D.Validates an application use case
E.Validates behavior of individual elements of your application
Question # 4
Which statement describes Delta Lake Auto Compaction?
Options:
A. An asynchronous job runs after the write completes to detect if files could be
further compacted; if yes, an optimize job is executed toward a default of 1 G
B.Before a Jobs cluster terminates, optimize is executed on all tables modified during
the most recent job.
C.Optimized writes use logical partitions instead of directory partitions; because
partition boundaries are only represented in metadata, fewer small files are written.