100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Databricks exam Questions with Solutions

Rating
-
Sold
-
Pages
36
Grade
A+
Uploaded on
07-04-2025
Written in
2024/2025

Exam of 36 pages for the course Databricks Lakehouse at Databricks Lakehouse (Databricks exam)

Institution
Databricks Lakehouse
Course
Databricks Lakehouse











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Databricks Lakehouse
Course
Databricks Lakehouse

Document information

Uploaded on
April 7, 2025
Number of pages
36
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Content preview

Databricks exam

What are the two main components of the Databricks
Architecture - answer1. The control plane consists of the backend services that
Databricks manages in its own cloud account.
2. The data plane is where the data is processed.

three different Databricks services – answer Databricks Data Science and Engineering
workspace
Databricks SQL
Databricks Machine Learning

What is a cluster? – answer Databricks cluster is a set of computation resources and
configurations on which you run data engineering, data science, and data analytics
workloads

What are the two cluster types? - answer All purpose clusters analyse data
collaboratively using interactive notebooks. You can create all-purpose
clusters from the Workspace or through the command line interface, or the REST APIs
that Databricks
provides. You can terminate and restart an all-purpose cluster. Multiple users can share
all-purpose
clusters to do collaborative interactive analysis.

Jobs clusters run automated jobs in an expeditious and robust way. The Databricks Job
scheduler
creates job clusters when you run Jobs and terminates them when the associated Job is
complete. You cannot restart a job cluster. These properties ensure an isolated
execution environment for each and
every Job.

What are the three cluster modes? - answer1. Standard clusters are ideal for
processing large amounts of data with Apache Spark.
2. Single Node clusters are intended for jobs that use small amounts of data or non-
distributed workloads such as single-node machine learning libraries.
3. High Concurrency clusters are ideal for groups of users who need to share resources
or run ad-hoc jobs. Administrators usually create High Concurrency clusters. Databricks
recommends enabling
autoscaling for High Concurrency clusters.

What is cluster size and autoscaling? - answerWhen you create a Databricks cluster,
you can either provide a fixed number of workers for the cluster or

,provide a minimum and maximum number of workers for the cluster. When you provide
a range for the number of workers, Databricks chooses the appropriate number of
workers required to run your job.

What is local disk encryption? - answerDatabricks generates an encryption key locally
that is unique to each cluster node and is used to encrypt all data stored on local disks.

What are the cluster security modes? - answerNone: No isolation. Does not enforce
workspace-local table access control or credential passthrough. Cannot access Unity
Catalog data.

Single User: Can be used only by a single user Automated jobs should use single-user
clusters.

User Isolation: Can be shared by multiple users. Only SQL workloads are supported.
Library installation, init scripts, and DBFS FUSE mounts are disabled to enforce strict
isolation among the cluster users

What is a pool? - answerTo reduce cluster start time, you can attach a cluster to a
predefined pool of idle instances, for the driver and worker nodes. The cluster is created
using instances in the pools. If a pool does not have sufficient idle resources to create
the requested driver or worker nodes, the pool expands by allocating new instances
from the instance provider. When an attached cluster is terminated, the instances it
used are returned to the pools and can be reused by a different cluster.

What happens when you terminate / restart / delete a cluster? - answerWhen a cluster
terminates (i.e. stops): all resources previously associated with the compute
environment will be completely removed. Cluster configuration settings are maintained,

The Restart button allows us to manually restart our cluster.

The Delete button will stop our cluster and remove the cluster configuration.

Which magic command do you use to run a notebook from another notebook? - answer
%run ../Includes/Classroom-Setup-1.2

What is Databricks utilities and how can you use it to list out directories of files from
Python cells? - answerDatabricks notebooks provide a number of utility commands for
configuring and interacting with the environment

display(dbutils.fs.ls("/databricks-datasets"))

What function should you use when you have tabular data returned by a Python cell? -
answerdisplay()

,What is the definition of a Delta Lake? - answerDelta Lake is the technology at the heart
of the Databricks Lakehouse platform. It is an open source
technology that enables building a data lakehouse on top of existing storage systems.

While Delta Lake was initially developed exclusively by

Databricks, it's been open sourced for almost 3
years.

Delta Lake builds upon standard data formats like Parquet and JSON.

Delta Lake is optimized for cloud object storage.

Delta Lake is built for scalable metadata handling.

How does Delta Lake address the data lake pain points to ensure reliable, ready-to-go
data? - answerACID Transactions
Schema Management
Scalable Metadata Handling
Unified Batch and Streaming Data
Data Versioning and Time Travel

Describe how Delta Lake brings ACID transactions to object storage - answerDifficult to
append data: Delta Lake provides guaranteed consistency for the state at the time an
append begins, as well as atomic transactions and high durability. As such, appends will
not fail
due to conflict, even when writing from many data sources simultaneously.

Difficult to modify existing data: upserts allow us to apply updates and deletes with
simple syntax as a single atomic transaction.

Jobs failing mid way: changes won't be committed until a job has succeeded. Jobs will
either fail or succeed completely.

Real time operations are not easy: Delta Lake allows atomic micro batched transaction
processing in near real time through a tight integration with Structured Streaming,
meaning that you can use both real time and batch operations in the same set of Delta
Lake tables.

Costly to keep historical data versions: the transaction logs used to guarantee atomicity,
consistency and isolation, allow snapshot queries which easily enables time travel on
your Delta Lake tables.

Is Delta Lake the default for all tables created in Databricks? - answerYes

, What data objects are in the Databricks Lakehouse? - answerCatalog: a grouping of
databases.

Database or schema: a grouping of objects in a catalog. Databases contain tables,
views, and functions.

Table: a collection of rows and columns stored as data files in object storage.

View: a saved query typically against one or more tables or data sources.

Function: saved logic that returns a scalar value or set of rows.

What is a metastore? - answerThe metastore contains all of the metadata that defines
data objects in the lakehouse. Databricks provides the following metastore options:

Unity Catalog: you can create a metastore to store and share metadata across multiple
Databricks workspaces. Unity Catalog is managed at the account level.

Hive metastore: Databricks stores all the metadata for the built-in Hive metastore as a
managed service. An instance of the metastore deploys to each cluster and securely
accesses metadata from
a central repository for each customer workspace.

External metastore: you can also bring your own metastore to Databricks

What is a catalog? - answerA catalog is the highest abstraction (or coarsest grain) in the
Databricks Lakehouse relational model. Every database will be associated with a
catalog. Catalogs exist as objects within a metastore. Before the introduction of Unity
Catalog, Databricks used a two-tier namespace. Catalogs are the third tier in the Unity
Catalog namespacing model:

catalog_name.database_name.table_name

What is a Delta Lake table? - answerA Databricks table is a collection of structured
data. A Delta table stores data as a directory of files on cloud object storage and
registers table metadata to the metastore within a catalog and schema.

As Delta Lake is the default storage provider for tables created in Databricks, all tables
created in Databricks are Delta tables, by default.

Because Delta tables store data in cloud object storage and provide references to data
through a metastore, users across an organization can access data using their
preferred APIs; on Databricks, this includes SQL, Python, PySpark, Scala, and R

How do relational objects work in Delta Live Tables? - answerDelta Live Tables uses
the concept of a "virtual schema" during logic planning and execution. Delta Live Tables

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
julianah420 Phoenix University
View profile
Follow You need to be logged in order to follow users or courses
Sold
658
Member since
2 year
Number of followers
324
Documents
34028
Last sold
7 hours ago
NURSING,TESTBANKS,ASSIGNMENT,AQA AND ALL REVISION MATERIALS

On this page, you find all documents, package deals, and flashcards offered by seller julianah420

4.3

149 reviews

5
101
4
20
3
8
2
5
1
15

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions