AWS Glue Job Knowledge UPDATED ACTUAL Exam Questions
and CORRECT Answers
What is the transformation of data? - The process of writing a Glue job script for like cleansing,
re-formatting, aggregating, mapping, and more on the extracted data. The objective is to prepare
and optimize the data which can be easily consumed
What is loading data? - The process of inserting the transformed data into the target database,
data warehouse, data store from where the users and various processes can consume it
What is it called when the extraction, transformation, and load processes work together? - An
ETL job
Benefits of using AWS and Terraform for cloud data? - -AWS: Using Amazon Web Services
(AWS) for cloud data offers benefits like scalability, reliability, and a wide range of cloud
services and tools. AWS provides managed data storage, processing, and analytics services,
making it easier to build and manage data solutions at scale.
-Terraform: Terraform is an Infrastructure as Code (IaC) tool that offers benefits such as
automation, version control, and consistency. When used in conjunction with AWS, Terraform
allows you to define and provision cloud data resources and configurations in a structured and
repeatable manner.
Together, AWS and Terraform streamline cloud data management by providing cloud
infrastructure and automation capabilities.term-37
What is the Glue Console? - The Glue Console is a interface that allows you to interact with and
manage various aspects of AWS Glue, such as creating and managing ETL jobs, defining data
catalogs, setting up crawlers, and monitoring and orchestrating data workflows. The Glue
Console is a visual tool for simplifying data preparation and transformation tasks in AWS.
,Why is AWS Glue a serverless ETL? - AWS Glue is a fully managed serverless environment
where you can extract, transform, and load (ETL) data at scale. With AWS Glue, you can
categorize data, clean it, enrich it, and move it reliably across various data stores and streams in a
cost-effective manner.
AWS Glue is serverless, so you don't have to worry about provisioning or managing servers.
With AWS Glue, you pay only for the resources you use, and you can scale up or down as
needed.
What is the difference between and purpose of raw and trust s3 buckets? - -Raw S3 Bucket:
stores unprocessed or minimally processed data.
-Trust S3 Bucket: stores data that has been processed, refined, and trusted for specific use cases.
It's often used to store clean, structured, and reliable data, making it suitable for direct
consumption by applications or analytics.
What is terraform? - - An Infrastructure as Code (IaC) tool for provisioning and managing
infrastructure.
- Provides a way to create, modify, and version infrastructure as if it were software.
- Supports multiple cloud providers (like AWS), making it a versatile choice for infrastructure
automation.
What is the difference between csv and parquet format? - -CSV (Comma-Separated Values): A
plain text format where data is represented as a series of values separated by commas. It's
human-readable but less efficient for large datasets. Suitable for simple data exchange.
-Parquet: A columnar storage format designed for analytics. It stores data in columns rather than
rows, which provides better compression and performance. Ideal for big data and data
warehousing, particularly when querying specific columns.
, What does Jules do? - Jules allows you to setup builds using .jenkins file and view it's status
from it's fancy UI
What is Data Lake? - Data Lake is a data storage system that stores raw data in its native format,
for various types of analysis and processing.
How do python test work? - Python testing involves writing test cases using libraries like
`unittest` or `pytest`. Test functions are created to check if a specific part of the code works as
expected. The tests are executed, and if any assertion fails, it indicates a problem in the code.
Automated testing helps ensure software correctness and maintainability.
What is a requirements.txt file and how do you create it? - A requirements.txt file is used in
Python to specify the dependencies (external libraries or packages) required by a project. It lists
the names and versions of packages, making it easier to recreate the project's environment.
What is the functionality of a __init__.py file - In Python, an "__init__.py" file serves two main
purposes. First, it makes a directory containing the file treated as a Python package or module,
allowing you to import its contents using Python's import statements. Second, it can contain
initialization code that runs when the package or module is imported, helping set up the
package's environment, variables, or other configurations.
What is the functionality of a Jule.yml file? - A "Jule.yml" file is commonly used in some
programming projects. It typically specifies project settings, dependencies, and configurations.
The file serves as a blueprint for tools like Jule to manage the project, making it easier to set up,
build, and run applications with the desired settings and dependencies.
What is the functionality of a Jenkinsfile? - It defines the entire build and deployment pipeline as
code. By specifying the steps, stages, and configurations in a Jenkinsfile, it automates building,
testing, and deploying software projects, enabling a consistent and reproducible process for
software development and delivery.
What is the functionality of a Makefile? - A Makefile is a configuration file used in software
development. It specifies dependencies and rules for building a project. When you run the
and CORRECT Answers
What is the transformation of data? - The process of writing a Glue job script for like cleansing,
re-formatting, aggregating, mapping, and more on the extracted data. The objective is to prepare
and optimize the data which can be easily consumed
What is loading data? - The process of inserting the transformed data into the target database,
data warehouse, data store from where the users and various processes can consume it
What is it called when the extraction, transformation, and load processes work together? - An
ETL job
Benefits of using AWS and Terraform for cloud data? - -AWS: Using Amazon Web Services
(AWS) for cloud data offers benefits like scalability, reliability, and a wide range of cloud
services and tools. AWS provides managed data storage, processing, and analytics services,
making it easier to build and manage data solutions at scale.
-Terraform: Terraform is an Infrastructure as Code (IaC) tool that offers benefits such as
automation, version control, and consistency. When used in conjunction with AWS, Terraform
allows you to define and provision cloud data resources and configurations in a structured and
repeatable manner.
Together, AWS and Terraform streamline cloud data management by providing cloud
infrastructure and automation capabilities.term-37
What is the Glue Console? - The Glue Console is a interface that allows you to interact with and
manage various aspects of AWS Glue, such as creating and managing ETL jobs, defining data
catalogs, setting up crawlers, and monitoring and orchestrating data workflows. The Glue
Console is a visual tool for simplifying data preparation and transformation tasks in AWS.
,Why is AWS Glue a serverless ETL? - AWS Glue is a fully managed serverless environment
where you can extract, transform, and load (ETL) data at scale. With AWS Glue, you can
categorize data, clean it, enrich it, and move it reliably across various data stores and streams in a
cost-effective manner.
AWS Glue is serverless, so you don't have to worry about provisioning or managing servers.
With AWS Glue, you pay only for the resources you use, and you can scale up or down as
needed.
What is the difference between and purpose of raw and trust s3 buckets? - -Raw S3 Bucket:
stores unprocessed or minimally processed data.
-Trust S3 Bucket: stores data that has been processed, refined, and trusted for specific use cases.
It's often used to store clean, structured, and reliable data, making it suitable for direct
consumption by applications or analytics.
What is terraform? - - An Infrastructure as Code (IaC) tool for provisioning and managing
infrastructure.
- Provides a way to create, modify, and version infrastructure as if it were software.
- Supports multiple cloud providers (like AWS), making it a versatile choice for infrastructure
automation.
What is the difference between csv and parquet format? - -CSV (Comma-Separated Values): A
plain text format where data is represented as a series of values separated by commas. It's
human-readable but less efficient for large datasets. Suitable for simple data exchange.
-Parquet: A columnar storage format designed for analytics. It stores data in columns rather than
rows, which provides better compression and performance. Ideal for big data and data
warehousing, particularly when querying specific columns.
, What does Jules do? - Jules allows you to setup builds using .jenkins file and view it's status
from it's fancy UI
What is Data Lake? - Data Lake is a data storage system that stores raw data in its native format,
for various types of analysis and processing.
How do python test work? - Python testing involves writing test cases using libraries like
`unittest` or `pytest`. Test functions are created to check if a specific part of the code works as
expected. The tests are executed, and if any assertion fails, it indicates a problem in the code.
Automated testing helps ensure software correctness and maintainability.
What is a requirements.txt file and how do you create it? - A requirements.txt file is used in
Python to specify the dependencies (external libraries or packages) required by a project. It lists
the names and versions of packages, making it easier to recreate the project's environment.
What is the functionality of a __init__.py file - In Python, an "__init__.py" file serves two main
purposes. First, it makes a directory containing the file treated as a Python package or module,
allowing you to import its contents using Python's import statements. Second, it can contain
initialization code that runs when the package or module is imported, helping set up the
package's environment, variables, or other configurations.
What is the functionality of a Jule.yml file? - A "Jule.yml" file is commonly used in some
programming projects. It typically specifies project settings, dependencies, and configurations.
The file serves as a blueprint for tools like Jule to manage the project, making it easier to set up,
build, and run applications with the desired settings and dependencies.
What is the functionality of a Jenkinsfile? - It defines the entire build and deployment pipeline as
code. By specifying the steps, stages, and configurations in a Jenkinsfile, it automates building,
testing, and deploying software projects, enabling a consistent and reproducible process for
software development and delivery.
What is the functionality of a Makefile? - A Makefile is a configuration file used in software
development. It specifies dependencies and rules for building a project. When you run the