100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Exam (elaborations)

Certified Generative AI Engineer Associate Exam (Databricks) Latest Version: 6.0 questions with verified answers and rationales | instant pdf download

Rating
-
Sold
-
Pages
66
Grade
A+
Uploaded on
03-02-2026
Written in
2025/2026

Certified Generative AI Engineer Associate Exam (Databricks) 2026–2028 | Latest Version: 6.0 | Questions with Verified Answers and Rationales | Instant PDF Download is the ultimate preparation resource for aspiring AI engineers targeting the Databricks Certified Generative AI Engineer Associate credential. This expertly curated PDF includes real-exam-style questions aligned with the latest Version 6.0 objectives, accompanied by verified answers and detailed rationales to deepen your understanding of generative AI concepts, prompt engineering, foundation models, fine-tuning, deployment on the Databricks Lakehouse, MLOps fundamentals, ethical AI practices, and cloud integration. Designed for data engineers, ML practitioners, AI enthusiasts, and professionals upskilling to stay ahead in the era of generative AI, this downloadable PDF enables targeted, efficient study and boosts confidence for success on the official Databricks certification exam. Get instant access and start mastering generative AI engineering today!

Show more Read less
Institution
AI & Data Science Programs
Course
AI & Data Science programs

Content preview

Databricks Certified Associate Developer for
Apache Spark 3.5 – Python Exam | Complete 200-
Question Practice Exam with Answers &
Explanations | PDF
Question 1

A data scientist of an e-commerce company is working with user data obtained
from its subscriber database and has stored the data in a DataFrame df_user.
Before further processing the data, the data scientist wants to create another
DataFrame df_user_non_pii and store only the non-PII columns in this
DataFrame. The PII columns in df_user are first_name, last_name,
email, and birthdate. Which code snippet can be used to meet this
requirement?

A. df_user_non_pii = df_user.drop("first_name",
"last_name", "email", "birthdate")
B. df_user_non_pii = df_user.drop("first_name",
"last_name", "email", "birthdate")
C. df_user_non_pii = df_user.dropfields("first_name",
"last_name", "email", "birthdate")
D. df_user_non_pii = df_user.dropfields("first_name,
last_name, email, birthdate")

Answer: A

Explanation:
The PySpark drop() method removes specified columns and returns a new
DataFrame. Multiple column names are passed as separate arguments.



Question 2

A data engineer is working on a Streaming DataFrame streaming_df with
unbounded streaming data.

,Which operation is supported with streaming_df?

A. streaming_df.select(countDistinct("Name"))
B. streaming_df.groupby("Id").count()
C. streaming_df.orderBy("timestamp").limit(4)
D. streaming_df.filter(col("count") < 30).show()

Answer: B

Explanation:
Structured Streaming supports aggregations over a key (groupBy). Global
operations like countDistinct, orderBy, limit, or show() are not
supported without windows or watermarks.



Question 3

An MLOps engineer is building a Pandas UDF that applies a language model
translating English strings to Spanish. The initial code loads the model on every
call to the UDF:

,def in_spanish_inner(df: pd.Series) -> pd.Series:
model = get_translation_model(target_lang='es')
return df.apply(model)

in_spanish = sf.pandas_udf(in_spanish_inner,
StringType())

How can the engineer reduce how many times the model is loaded?

A. Convert the Pandas UDF to a PySpark UDF
B. Convert the Pandas UDF from Series→Series to Series→Scalar UDF
C. Run the in_spanish_inner() function in a mapInPandas() call
D. Convert the Pandas UDF from Series→Series to
Iterator[Series]→Iterator[Series] UDF

Answer: D

Explanation:
Iterator-based Pandas UDFs load the model once per executor, instead of per
batch, improving performance.



Question 4

A Spark DataFrame df is cached using MEMORY_AND_DISK, but it is too large to
fit entirely in memory. What is the likely behavior?

A. Spark duplicates the DataFrame in both memory and disk. If it doesn't fit in
memory, the DataFrame is stored and retrieved from disk entirely.
B. Spark splits the DataFrame evenly between memory and disk.
C. Spark stores as much as possible in memory and spills the rest to disk when
memory is full, continuing processing with performance overhead.
D. Spark stores frequently accessed rows in memory and less frequently accessed
rows on disk.

Answer: C

Explanation:
MEMORY_AND_DISK caches as much data as possible in memory and spills the
remainder to disk to continue processing.

, Question 5

A data engineer is building a Structured Streaming pipeline and wants it to recover
from failures or intentional shutdowns by continuing where it left off. How can this
be achieved?

A. Configure checkpointLocation during readStream
B. Configure recoveryLocation during SparkSession initialization
C. Configure recoveryLocation during writeStream
D. Configure checkpointLocation during writeStream

Answer: D

Explanation:
Setting checkpointLocation in writeStream allows Spark to store
streaming progress and recover from failures.




Question 6

A Spark DataFrame df contains a column event_time of type timestamp.
You want to calculate the time difference in seconds between consecutive rows,
partitioned by user_id and ordered by event_time. Which function should
you use?

A. lag()
B. lead()
C. row_number()
D. dense_rank()

Answer: A

Explanation:
The lag() function returns the value of a column from a previous row in a
window. Combined with window partitioning and ordering, it allows you to
calculate differences between consecutive rows.

Written for

Institution
AI & Data Science programs
Course
AI & Data Science programs

Document information

Uploaded on
February 3, 2026
Number of pages
66
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
studyguidepro NURSING
View profile
Follow You need to be logged in order to follow users or courses
Sold
78
Member since
4 months
Number of followers
5
Documents
1631
Last sold
3 hours ago
verified exams

Updated exams .Actual tests 100% verified.ATI,NURSING,PMHNP,TNCC,USMLE,ACLS,WGU AND ALL EXAMS guaranteed success.Here, you will find everything you need in NURSING EXAMS AND TESTBANKS.Contact us, to fetch it for you in minutes if we do not have it in this shop.BUY WITHOUT DOUBT!!!!Always leave a review after purchasing any document so as to make sure our customers are 100% satisfied. **Ace Your Exams with Confidence!**

3.9

18 reviews

5
10
4
1
3
4
2
1
1
2

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions