Databricks ML Associate Latest Update Graded A+
Databricks ML Associate Latest Update Graded A+ Driver Nodes vs Worker Nodes A cluster consists of one driver node and zero or more worker nodes. Databricks worker nodes run the Spark executors and other services required for proper functioning clusters. The driver node maintains state information of all notebooks attached to the cluster. Describe Standard Clusters - Now called "No Isolation Shared" clusters - Admins can prevent creating or starting this cluster type by enforcing isolation - Requires at least one spark worker node in addition to the driver node to execute Spark jobs What are the different cluster Access Levels available? - No Isolation Shared (Standard) - Shared - Single User - Custom What are the 2 Cluster Modes available? - Multi-Node - Single-Node What cluster access level is NOT compatible with the Databricks Runtime for Machine learning? Shared Describe Single Node cluster mode? - Runs Spark locally - No worker Nodes - Large scale data processing will exhaust resources quickly - Not designed to be shared What are policies in relation to clusters? A set of rules used to limit the configuration options available to users when they create a cluster. True or False: All Databricks Runtime versions include Apache Spark. True - New versions add components and updates that improve usability, performance, and security. True or False: To run a Spark job, you need at least one worker node. True - If a cluster has zero workers, you can run non-Spark commands on the driver node, but Spark commands will fail. True or False: You can connect a repo from an external Git provider to Databricks repos True - Workspace > Repos > Add Repo > enter details What orchestrates multi-task ML workflows using Databricks jobs? Databricks Runtime for Machine Learning. A cluster has a Databricks runtime. This can change by cluster, but the ML runtime already has the needed ML libraries installed. What is the difference between a Databricks Runtime for Machine Learning and non ML Runtimes? They have built-in compatible versions of the most common deep learning libraries like TensorFlow, PyTorch, and Keras, and supporting libraries such as Petastorm, Hyperopt, and Horovod. Databricks Runtime ML clusters also include pre-configured GPU support with drivers and supporting libraries. What are Jobs in Databricks and where can you manage them? Orchestrate your data processing, machine learning, or data analytics pipelines on the Databricks platform. Workflows > jobs Where do you create a cluster with the Databricks Runtime for Machine Learning. Compute > Create > Databricks Runtime Version > ML > ### LTS ML How do you install a Python library to be available to all notebooks that run on a cluster. Compute > Create >Advanced Options > Init Scripts > add Databricks/python/bin/pip install newPackage OR Compute > Create >Libraries >Install New (recommended) What is the code to impute variables from re import Imputer imputer = Imputer(strategy="median", inputCols=impute_cols, outputCols=impute_cols_out) imputer_model = (doubles_df) imputed_df = imputer_form(doubles_df) Identify the steps of the machine learning workflow completed by AutoML. 1. Prepares dataset from model training. Carries out imbalanced data detection for classification problems prior to model training 2. Iterates to train and tune multiple
Written for
- Institution
- Databricks ML Associate
- Course
- Databricks ML Associate
Document information
- Uploaded on
- April 22, 2024
- Number of pages
- 24
- Written in
- 2023/2024
- Type
- Exam (elaborations)
- Contains
- Questions & answers
Subjects
-
databricks ml associate latest update graded a
Also available in package deal