100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4,6 TrustPilot
logo-home
Tentamen (uitwerkingen)

Certified Generative AI Engineer Associate Exam (Databricks) Latest Version: 6.0 questions with verified answers and rationales | instant pdf download

Beoordeling
-
Verkocht
-
Pagina's
66
Cijfer
A+
Geüpload op
03-02-2026
Geschreven in
2025/2026

Certified Generative AI Engineer Associate Exam (Databricks) 2026–2028 | Latest Version: 6.0 | Questions with Verified Answers and Rationales | Instant PDF Download is the ultimate preparation resource for aspiring AI engineers targeting the Databricks Certified Generative AI Engineer Associate credential. This expertly curated PDF includes real-exam-style questions aligned with the latest Version 6.0 objectives, accompanied by verified answers and detailed rationales to deepen your understanding of generative AI concepts, prompt engineering, foundation models, fine-tuning, deployment on the Databricks Lakehouse, MLOps fundamentals, ethical AI practices, and cloud integration. Designed for data engineers, ML practitioners, AI enthusiasts, and professionals upskilling to stay ahead in the era of generative AI, this downloadable PDF enables targeted, efficient study and boosts confidence for success on the official Databricks certification exam. Get instant access and start mastering generative AI engineering today!

Meer zien Lees minder
Instelling
AI & Data Science Programs
Vak
AI & Data Science programs

Voorbeeld van de inhoud

Databricks Certified Associate Developer for
Apache Spark 3.5 – Python Exam | Complete 200-
Question Practice Exam with Answers &
Explanations | PDF
Question 1

A data scientist of an e-commerce company is working with user data obtained
from its subscriber database and has stored the data in a DataFrame df_user.
Before further processing the data, the data scientist wants to create another
DataFrame df_user_non_pii and store only the non-PII columns in this
DataFrame. The PII columns in df_user are first_name, last_name,
email, and birthdate. Which code snippet can be used to meet this
requirement?

A. df_user_non_pii = df_user.drop("first_name",
"last_name", "email", "birthdate")
B. df_user_non_pii = df_user.drop("first_name",
"last_name", "email", "birthdate")
C. df_user_non_pii = df_user.dropfields("first_name",
"last_name", "email", "birthdate")
D. df_user_non_pii = df_user.dropfields("first_name,
last_name, email, birthdate")

Answer: A

Explanation:
The PySpark drop() method removes specified columns and returns a new
DataFrame. Multiple column names are passed as separate arguments.



Question 2

A data engineer is working on a Streaming DataFrame streaming_df with
unbounded streaming data.

,Which operation is supported with streaming_df?

A. streaming_df.select(countDistinct("Name"))
B. streaming_df.groupby("Id").count()
C. streaming_df.orderBy("timestamp").limit(4)
D. streaming_df.filter(col("count") < 30).show()

Answer: B

Explanation:
Structured Streaming supports aggregations over a key (groupBy). Global
operations like countDistinct, orderBy, limit, or show() are not
supported without windows or watermarks.



Question 3

An MLOps engineer is building a Pandas UDF that applies a language model
translating English strings to Spanish. The initial code loads the model on every
call to the UDF:

,def in_spanish_inner(df: pd.Series) -> pd.Series:
model = get_translation_model(target_lang='es')
return df.apply(model)

in_spanish = sf.pandas_udf(in_spanish_inner,
StringType())

How can the engineer reduce how many times the model is loaded?

A. Convert the Pandas UDF to a PySpark UDF
B. Convert the Pandas UDF from Series→Series to Series→Scalar UDF
C. Run the in_spanish_inner() function in a mapInPandas() call
D. Convert the Pandas UDF from Series→Series to
Iterator[Series]→Iterator[Series] UDF

Answer: D

Explanation:
Iterator-based Pandas UDFs load the model once per executor, instead of per
batch, improving performance.



Question 4

A Spark DataFrame df is cached using MEMORY_AND_DISK, but it is too large to
fit entirely in memory. What is the likely behavior?

A. Spark duplicates the DataFrame in both memory and disk. If it doesn't fit in
memory, the DataFrame is stored and retrieved from disk entirely.
B. Spark splits the DataFrame evenly between memory and disk.
C. Spark stores as much as possible in memory and spills the rest to disk when
memory is full, continuing processing with performance overhead.
D. Spark stores frequently accessed rows in memory and less frequently accessed
rows on disk.

Answer: C

Explanation:
MEMORY_AND_DISK caches as much data as possible in memory and spills the
remainder to disk to continue processing.

, Question 5

A data engineer is building a Structured Streaming pipeline and wants it to recover
from failures or intentional shutdowns by continuing where it left off. How can this
be achieved?

A. Configure checkpointLocation during readStream
B. Configure recoveryLocation during SparkSession initialization
C. Configure recoveryLocation during writeStream
D. Configure checkpointLocation during writeStream

Answer: D

Explanation:
Setting checkpointLocation in writeStream allows Spark to store
streaming progress and recover from failures.




Question 6

A Spark DataFrame df contains a column event_time of type timestamp.
You want to calculate the time difference in seconds between consecutive rows,
partitioned by user_id and ordered by event_time. Which function should
you use?

A. lag()
B. lead()
C. row_number()
D. dense_rank()

Answer: A

Explanation:
The lag() function returns the value of a column from a previous row in a
window. Combined with window partitioning and ordering, it allows you to
calculate differences between consecutive rows.

Geschreven voor

Instelling
AI & Data Science programs
Vak
AI & Data Science programs

Documentinformatie

Geüpload op
3 februari 2026
Aantal pagina's
66
Geschreven in
2025/2026
Type
Tentamen (uitwerkingen)
Bevat
Vragen en antwoorden

Onderwerpen

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
studyguidepro NURSING
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
78
Lid sinds
4 maanden
Aantal volgers
5
Documenten
1631
Laatst verkocht
4 uur geleden
verified exams

Updated exams .Actual tests 100% verified.ATI,NURSING,PMHNP,TNCC,USMLE,ACLS,WGU AND ALL EXAMS guaranteed success.Here, you will find everything you need in NURSING EXAMS AND TESTBANKS.Contact us, to fetch it for you in minutes if we do not have it in this shop.BUY WITHOUT DOUBT!!!!Always leave a review after purchasing any document so as to make sure our customers are 100% satisfied. **Ace Your Exams with Confidence!**

3,9

18 beoordelingen

5
10
4
1
3
4
2
1
1
2

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen