Exam Questions And Answers Latest
Update
with Exposures you can CORRECT ANSWERS run, test, and list resources that feed
into your exposure
populate a dedicated page in the auto-generated documentation site with context
relevant to data consumers
Variables can be used to CORRECT ANSWERS configure timezones, avoid hardcoding
table names
vars can be scoped globally, or to a specific package imported in your project.
CORRECT ANSWERS name: my_dbt_project
version: 1.0.0
config-version: 2
vars:
# The `start_date` variable will be accessible in all resources
start_date: '2016-06-01'
# The `platforms` variable is only accessible to resources in the my_dbt_project project
my_dbt_project:
platforms: ['web', 'mobile']
# The `app_ids` variable is only accessible to resources in the snowplow package
snowplow:
app_ids: ['marketing', 'app', 'landing-page']
models: ...
partial parsing CORRECT ANSWERS In dbt Cloud, partial parsing of a project can lead
to issues. If you find that your dbt project is not compiling to the values you have set,
deleting the target/partial parse.msgpack file in your project can help. Doing so will force
dbt to recompile your entire project and may help resolve any issues caused by partial
parsing.
just like SQL models, there are three ways to configure Python models: CORRECT
ANSWERS In dbt_project.yml, where you can configure many models at once
In a dedicated .yml file, within the models/ directory
Within the model's .py file, using the dbt.config() method
To enable tracking of hard deletes, the "invalidate_hard_deletes" option should be
turned CORRECT ANSWERS ON in the dbt configuration file. This is done by setting
"invalidate_hard_deletes: true".
This will cause dbt to mark rows as invalid if they are deleted from the source query.
,Include as many columns as possible in the snapshot (True or False) CORRECT
ANSWERS True. It is recommended to include as many columns as possible in the
snapshot, even if they do not seem useful at the moment, as snapshots cannot be
recreated. Reference:
https://docs.getdbt.com/docs/build/snapshots
Filter CORRECT ANSWERS Some databases can have tables where a filter over
certain columns are required, in order prevent a full scan of the table, which could be
costly. In order to do a freshness check on such tables a filter argument can be added
to the configuration, e.g. filter: _etl_loaded_at >= date_sub(current_date(), interval 1
day). For the example above, the resulting query would look like
CORRECT ANSWERS By default, dbt will not quote the database, schema, or identifier
for the source tables that you've specified.
To force dbt to quote one of these values, use the quoting property.
https://docs.getdbt.com/docs/build/sources
version: 2
sources:
- name: jaffle_shop
database: raw
quoting:
database: true
schema: true
identifier: true
tables:
- name: order_items
- name: orders
# This overrides the `jaffle_shop` quoting config
quoting:
identifier: false
What are the arguments passed to the model() function when running a dbt project with
dbt run --select python_model? CORRECT ANSWERS When running a dbt project with
dbt run --select python_model, dbt will prepare and pass in both arguments (dbt and
session) to the model() function.
- dbt: A class compiled by abt Core, unique to each model, enables you to run your
Python code in the context of your dbt project and DAG.
- session: A class representing your data platform's connection to the Python backend.
The session is needed to read in tables as DataFrames, and to write DataFrames back
to tables. In PySpark, by convention, the SparkSession is named spark, and available
, globally. For consistency across platforms, we always pass it into the model function as
an explicit argument called session.
The order of precedence for variable declaration is as follows (highest priority first):
CORRECT ANSWERS 1. The variables defined on the command line with --vars.
1. The package-scoped variable declaration in the dbt_project.yml file
3. The global variable declaration in the dbt_project.yml file.
4. The variable's default argument (if one is provided).
The dbt source freshness command will output: CORRECT ANSWERS 1. a
pass/warning/error status for each table selected in the freshness snapshot.
2. dbt will write the freshness results to a file in the target/ directory called sources.json
by default.
3. You can also override this destination, use the -o flag to the dbt source freshness
command. E.g dbt source freshness -o custom output directory/freshness report.json
What types of Incremental strategies does dbt Python models support? CORRECT
ANSWERS Incremental dbt Python models support all the same incremental strategies
as their SQL counterparts.
The specific strategies(merge or insert_overwrite) supported depend on the adapter or
data platform used.
Reference: https://docs.getdbt.com/docs/build/python-models
The Python model in dbt has the capability to incorporate additional functions:
CORRECT ANSWERS either through importing external functions or by defining its
own. This allows for the creation of non-dbt functions within the same Python model file
for use in the model. However, it's currently not possible to import and reuse Python
functions defined in one dbt model in other models. For more information, see the
section on Code Reuse for potential patterns under consideration. Additionally, the
Python model in dbt allows for the definition of functions that utilize third-party
packages, provided that these packages are installed and accessible to the Python
runtime on your data platform.
Reference: https://docs.getdbt.com/docs/build/python-models
To check the snapshot freshness for a specific table in a source schema CORRECT
ANSWERS This command allows you to ensure that your data is up-to-date and in line
with the freshness expectations defined in your dbt project.
dbt source freshness --select source:source_schema_name.table_name
Reference: https://docs.getdbt.com/docs/build/sources
You are working on implementing source freshness for one of your source tables, you
have defined source freshness for the source in your source schema.yml file, how will dt
implement this? CORRECT ANSWERS the loaded_at_field is required to calculate
freshness for a table. If a loaded_at_field is not provided, then dbt will not calculate