Databricks Lakehouse Platform Questions and Answers with complete solution
What is a data lakehouse? - A Datalakehouse combines the ACID transactions and data goveranance of a datawarehouse with the flexibility and cost-efficiency of a data lake. How does the databricks datalakehouse differ from the a traditional datawarehouse? - It enables both batch and streaming analytics What storage system is databricks built on top of? -DBFS - Databricks file storage. DBFS is an abstraction layer that is provisioned whenever you create a databricks cluster. The actual data is stored on cloud. What are the 2 components of a Delta Table? - The File and the transaction log What are the benefits of Delta Tables? - ACID Transactions to object storage Provides audit trail of all changes Scalable metadata handling Builds on standard formats (Parquet and JSON) What 2 methods can you use for time travel? - Can input the version number (VERSION AS OF) Or can input the timestamp (TIMESTAMP AS OF) What does the OPTIMIZE command do? - Compacts small files into larger files in order to increase speed and table performance What is Z-ORDER indexing? - You can specify and index based off of a column, and this will cause the files to be compacted into files that are ordered by the z-index. This means that when querying, ifyou know the record by it's z-index, then databricks knows that record is only in file 1 and so only that file will be searched which saves a lot of time. What does the VACUUM command do? - VACUUM will allow to specify a time period for a certain table where before that, older files will be deleted. Note: this will mean you can only time-travel to the beginning of the retention period as previous files will no longer exist. What is the Hive metastore? - The Hive metastore is a special read-only table created within Hive that stores the meta-data related to all tables in a Hive database. What's the difference between managed and external tables? - Managed tables are managed by databricks and both the metadata and actual data are stored in the databricks workspace. External tables have the actual data on an external datastore. This means that if a table is dropped, the actual data is not deleted, unlike managed tables. What's the difference between the create table command and the CTAS command? - CTAS allows to create to table or view based on the result of a select query. CTAS will infer schema whereas CREATE command requires manual declaration of schema. What's the difference between deep and shallow cloning? - Deep cloning will copy both the metadata and the actual data to a target datastore. Shallow clone will not actually copy the underlying data. What is a view? - It is a virtual representation of the data without storing any data itself. What are the 3 types of views? - Normal view - Created within a database and can only be accessed within that database Global view - These views can be accessed in any database within a workspace Temporary view - Views only exist for that specific spark sessionWhich of the following developer operations in CI/CD flow can be implemented in Databricks Repos? - Merge when code is committed - Pull request and review process - Trigger Databricks Repos API to pull the latest version of code into production folder - Resolve merge conflicts - Delete a branch - Trigger Databricks Repos API to pull the latest version of code into production folder if you run the command VACUUM transactions retain 0 hours? What is the outcome of this command? - Command will be successful, but no data is removed - Command will fail if you have an active transaction running - Command will fail, you cannot run the command with retentionDurationcheck enabled - Command will be successful, but historical data will be removed - Command runs successful and compacts all of the data in the table - Command will fail, you cannot run the command with retentionDurationcheck enabled
Written for
- Institution
- Databricks Lakehouse Platform
- Course
- Databricks Lakehouse Platform
Document information
- Uploaded on
- December 15, 2023
- Number of pages
- 10
- Written in
- 2023/2024
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
databricks lakehouse platform
Also available in package deal