100% de satisfacción garantizada Inmediatamente disponible después del pago Tanto en línea como en PDF No estas atado a nada 4,6 TrustPilot
logo-home
Examen

Databricks-Certified-Professional-Data-Engineer Practice Exam Questions and Answers

Puntuación
-
Vendido
-
Páginas
11
Grado
A+
Subido en
26-05-2024
Escrito en
2023/2024

Question # 1 A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes that the Min, Median, and Max Durations for tasks in a particular stage show the minimum and median time to complete a task as roughly the same, but the max duration for a task to be roughly 100 times as long as the minimum. Which situation is causing increased duration of the overall job? Options: A.Task queueing resulting from improper thread pool assignment. B.Spill resulting from attached volume storage being too small. C.Network latency due to some cluster nodes being in different regions from the source data D.Skew caused by more data being assigned to a subset of spark-partitions. E.Credential validation errors while pulling data from an external system. Question # 2 The downstream consumers of a Delta Lake table have been complaining about data quality issues impacting performance in their applications. Specifically, they have complained that invalidlatitudeandlongitudevalues in theactivity_detailstable have been breaking their ability to use other geolocation processes. A junior engineer has written the following code to addCHECKconstraints to the Delta Lake table: A senior engineer has confirmed the above logic is correct and the valid ranges for latitude and longitude are provided, but the code fails when executed. Which statement explains the cause of this failure? Options: A.Because another team uses this table to support a frequently running application, two-phase locking is preventing the operation from committing. B.The activity details table already exists; CHECK constraints can only be added during initial table creation. C.The activity details table already contains records that violate the constraints; all existing data must pass CHECK constraints in order to add them to an existing table. D.The activity details table already contains records; CHECK constraints can only be added prior to inserting values into a table. E.The current table schema does not contain the field valid coordinates; schema evolution will need to be enabled before altering the table to add a constraint. Question # 3 Which statement describes integration testing? Options: A.Validates interactions between subsystems of your application B.Requires an automated testing framework C.Requires manual intervention D.Validates an application use case E.Validates behavior of individual elements of your application Question # 4 Which statement describes Delta Lake Auto Compaction? Options: A. An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 1 G B.Before a Jobs cluster terminates, optimize is executed on all tables modified during the most recent job. C.Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.CONTINUED......

Mostrar más Leer menos
Institución
Grado









Ups! No podemos cargar tu documento ahora. Inténtalo de nuevo o contacta con soporte.

Escuela, estudio y materia

Grado

Información del documento

Subido en
26 de mayo de 2024
Número de páginas
11
Escrito en
2023/2024
Tipo
Examen
Contiene
Preguntas y respuestas

Temas

Vista previa del contenido

Question # 1

A Spark job is taking longer than expected. Using the Spark UI, a data engineer notes
that the Min, Median, and Max Durations for tasks in a particular stage show the
minimum and median time to complete a task as roughly the same, but the max
duration for a task to be roughly 100 times as long as the minimum.

Which situation is causing increased duration of the overall job?

Options:

A.Task queueing resulting from improper thread pool assignment.

B.Spill resulting from attached volume storage being too small.

C.Network latency due to some cluster nodes being in different regions from the
source data

D.Skew caused by more data being assigned to a subset of spark-partitions.

E.Credential validation errors while pulling data from an external system.

Question # 2

The downstream consumers of a Delta Lake table have been complaining about data
quality issues impacting performance in their applications. Specifically, they have
complained that invalidlatitudeandlongitudevalues in theactivity_detailstable have
been breaking their ability to use other geolocation processes.

A junior engineer has written the following code to addCHECKconstraints to the
Delta Lake table:




A senior engineer has confirmed the above logic is correct and the valid ranges for
latitude and longitude are provided, but the code fails when executed.

, Which statement explains the cause of this failure?

Options:

A.Because another team uses this table to support a frequently running application,
two-phase locking is preventing the operation from committing.

B.The activity details table already exists; CHECK constraints can only be added
during initial table creation.

C.The activity details table already contains records that violate the constraints; all
existing data must pass CHECK constraints in order to add them to an existing table.

D.The activity details table already contains records; CHECK constraints can only be
added prior to inserting values into a table.

E.The current table schema does not contain the field valid coordinates; schema
evolution will need to be enabled before altering the table to add a constraint.

Question # 3

Which statement describes integration testing?

Options:

A.Validates interactions between subsystems of your application

B.Requires an automated testing framework

C.Requires manual intervention

D.Validates an application use case

E.Validates behavior of individual elements of your application

Question # 4

Which statement describes Delta Lake Auto Compaction?

Options:

A. An asynchronous job runs after the write completes to detect if files could be
further compacted; if yes, an optimize job is executed toward a default of 1 G

B.Before a Jobs cluster terminates, optimize is executed on all tables modified during
the most recent job.

C.Optimized writes use logical partitions instead of directory partitions; because
partition boundaries are only represented in metadata, fewer small files are written.
$10.99
Accede al documento completo:

100% de satisfacción garantizada
Inmediatamente disponible después del pago
Tanto en línea como en PDF
No estas atado a nada

Conoce al vendedor
Seller avatar
munyuabeatrice92

Conoce al vendedor

Seller avatar
munyuabeatrice92 K
Seguir Necesitas iniciar sesión para seguir a otros usuarios o asignaturas
Vendido
1
Miembro desde
2 año
Número de seguidores
1
Documentos
347
Última venta
1 año hace

0.0

0 reseñas

5
0
4
0
3
0
2
0
1
0

Recientemente visto por ti

Por qué los estudiantes eligen Stuvia

Creado por compañeros estudiantes, verificado por reseñas

Calidad en la que puedes confiar: escrito por estudiantes que aprobaron y evaluado por otros que han usado estos resúmenes.

¿No estás satisfecho? Elige otro documento

¡No te preocupes! Puedes elegir directamente otro documento que se ajuste mejor a lo que buscas.

Paga como quieras, empieza a estudiar al instante

Sin suscripción, sin compromisos. Paga como estés acostumbrado con tarjeta de crédito y descarga tu documento PDF inmediatamente.

Student with book image

“Comprado, descargado y aprobado. Así de fácil puede ser.”

Alisha Student

Preguntas frecuentes