GCP Professional Engineer Exam Questions And
Accurate Answers
HDFS (Hadoop Distributed File System)-Answer Open source, Hadoop system that
partitions data across many machines. 1 master node + multiple data nodes. Basis for
Cloud Storage.
MapReduce-Answer Hadoop framework for processing large data sets in parallel. 2 step
system- first broken down into key/value pairs, then data set is brought back together.
YARN- It coordinates the tasks running on the Hadoop cluster and assigns the new
nodes in case of failure. This contains resource manager and node manager.
HIVE- Hadoop data warehousing software capable of reading/writing large data sets
using SQL-like queries.
PIG- Hadoop ETL tool for pulling, cleaning/transforming, and loading data into data
warehouses using high level scripting language. GCP:DataFlow
SPARK - Answer Apache framework for data processing/analytics. Supports SQL
queries, processes data using Resilient Distributed Datasets (RDD's)
HBase - Answer Hadoop Non-relational, No-SQL, columnar database. Runs on HDFS.
Good for sparse datasets. (GCP:BigTable)
Flint/Kafka - Answer Hadoop stream processing framework. Includes windowing using
tumbling, sliding, or session.
Flint/Kafka - Answer Hadoop stream processing framework. Includes windowing using
, tumbling, sliding, or session.
Oozie - Service to execute and schedule workflows for Hadoop jobs
Sqoop - Hadoop framework for transferring data from relational database to HDFS
(MySQL)
IAM -Identity Access Management-Service to enforce member's roles/permissions
Google App Engine - Fully managed serverless platform for building highly available
applications. Ideal for IoT, websites etc. Paas
Google Kubernetes Engine - Answer Logical Infrastructure running opensource
container orchestration system. Good for cloud-native applications
Google Compute Engine - Answer VM's running in Google's data center. Fully
customizable including OS. (Iaas) Used for workloads that require specific OS
configuration
Preemtible Instances - Answer Instances run at a much lower price but could be
terminated at any time. Max time is 24hrs. Good for cost-cutting on interuptable
workloads
Snapshots - Answer Used for disk backup
Images - Answer Exact copy of Virtual Machine VM operating system
Cloud Storage - Answer Object storage ex. images, non-structured text blocks, videos,
voice recordings. blobstorage Cannot query by content. Contained in "buckets"
Cloud SQL - Answer Relational database, query w/ SQL. Not horizontally scalable Use
for <1 TB Ex. User credentials, Customer orders. highly structured
Cloud Spanner Relational database, but also horizontally scalable. Can define
Accurate Answers
HDFS (Hadoop Distributed File System)-Answer Open source, Hadoop system that
partitions data across many machines. 1 master node + multiple data nodes. Basis for
Cloud Storage.
MapReduce-Answer Hadoop framework for processing large data sets in parallel. 2 step
system- first broken down into key/value pairs, then data set is brought back together.
YARN- It coordinates the tasks running on the Hadoop cluster and assigns the new
nodes in case of failure. This contains resource manager and node manager.
HIVE- Hadoop data warehousing software capable of reading/writing large data sets
using SQL-like queries.
PIG- Hadoop ETL tool for pulling, cleaning/transforming, and loading data into data
warehouses using high level scripting language. GCP:DataFlow
SPARK - Answer Apache framework for data processing/analytics. Supports SQL
queries, processes data using Resilient Distributed Datasets (RDD's)
HBase - Answer Hadoop Non-relational, No-SQL, columnar database. Runs on HDFS.
Good for sparse datasets. (GCP:BigTable)
Flint/Kafka - Answer Hadoop stream processing framework. Includes windowing using
tumbling, sliding, or session.
Flint/Kafka - Answer Hadoop stream processing framework. Includes windowing using
, tumbling, sliding, or session.
Oozie - Service to execute and schedule workflows for Hadoop jobs
Sqoop - Hadoop framework for transferring data from relational database to HDFS
(MySQL)
IAM -Identity Access Management-Service to enforce member's roles/permissions
Google App Engine - Fully managed serverless platform for building highly available
applications. Ideal for IoT, websites etc. Paas
Google Kubernetes Engine - Answer Logical Infrastructure running opensource
container orchestration system. Good for cloud-native applications
Google Compute Engine - Answer VM's running in Google's data center. Fully
customizable including OS. (Iaas) Used for workloads that require specific OS
configuration
Preemtible Instances - Answer Instances run at a much lower price but could be
terminated at any time. Max time is 24hrs. Good for cost-cutting on interuptable
workloads
Snapshots - Answer Used for disk backup
Images - Answer Exact copy of Virtual Machine VM operating system
Cloud Storage - Answer Object storage ex. images, non-structured text blocks, videos,
voice recordings. blobstorage Cannot query by content. Contained in "buckets"
Cloud SQL - Answer Relational database, query w/ SQL. Not horizontally scalable Use
for <1 TB Ex. User credentials, Customer orders. highly structured
Cloud Spanner Relational database, but also horizontally scalable. Can define