DP-900 Exam Study Questions with Correct
Verified Solutions
What is an analytical system?
In contrast to systems designed to support OLTP, an analytical system is designed to support
business users who need to query data and gain a big picture view of the information held in a
database.
Analytical systems are concerned with capturing raw data, and using it to generate insights. An
organization can use these insights to make business decisions. For example, detailed insights
for a manufacturing company might indicate trends enabling them to determine which product
lines to focus on, for profitability.
What is a transactional system?
A transactional system is often what most people consider the primary function of business
computing. A transactional system records transactions. A transaction could be financial, such
as the movement of money between accounts in a banking system, or it might be part of a retail
system, tracking payments for goods and services from customers. Think of a transaction as a
small, discrete, unit of work.
Data Ingestion
Data ingestion is the process of capturing the raw data. This data could be taken from control
devices measuring environmental information such as temperature and pressure, point-of-sale
devices recording the items purchased by a customer in a supermarket, financial data recording
the movement of money between bank accounts, and weather data from weather stations. Some
,of this data might come from a separate OLTP system. To process and analyze this data, you
must first store the data in a repository of some sort. The repository could be a file store, a
document database, or even a relational database.
Data Transformation/Data Processing
Data processing is simply the conversion of raw data to meaningful information through a
process.
The raw data might not be in a format that is suitable for querying. The data might contain
anomalies that should be filtered out, or it may require transforming in some way. For example,
dates or addresses might need to be converted into a standard format. After data is ingested into a
data repository, you may want to do some cleaning operations and remove any questionable or
invalid data, or perform some aggregations such as calculating profit, margin, and other Key
Performance Indicators (KPIs). KPIs are how businesses are measured for growth and
performance.
Batch Processing
Buffering and processing the data in groups. The whole group is then processed at a future time.
Advantages of batch processing include:
Large volumes of data can be processed at a convenient time.
It can be scheduled to run at a time when computers or systems might otherwise be idle, such as
overnight, or during off-peak hours.
Disadvantages of batch processing include:
, The time delay between ingesting the data and getting the results.
All of a batch job's input data must be ready before a batch can be processed. This means data
must be carefully checked. Problems with data, errors, and program crashes that occur during
batch jobs bring the whole process to a halt. The input data must be carefully checked before the
job can be run again. Even minor data errors, such as typographical errors in dates, can prevent a
batch job from running.
Options include running U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or custom
Map/Reduce jobs in an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in
an HDInsight Spark cluster.
Stream data processing
Handles data in real time.
Stream processing is ideal for time-critical operations that require an instant real-time
response. For example, a system that monitors a building for smoke and heat needs to trigger
alarms and unlock doors to allow residents to escape immediately in the event of a fire.
After capturing real-time messages, the solution must process them by filtering, aggregating, and
otherwise preparing the data for analysis. The processed stream data is then written to an output
sink. Azure Stream Analytics provides a managed stream processing service based on
perpetually running SQL queries that operate on unbounded streams.
You can also use open source Apache streaming technologies like Storm and Spark Streaming in
an HDInsight cluster.
Database Administrators
Verified Solutions
What is an analytical system?
In contrast to systems designed to support OLTP, an analytical system is designed to support
business users who need to query data and gain a big picture view of the information held in a
database.
Analytical systems are concerned with capturing raw data, and using it to generate insights. An
organization can use these insights to make business decisions. For example, detailed insights
for a manufacturing company might indicate trends enabling them to determine which product
lines to focus on, for profitability.
What is a transactional system?
A transactional system is often what most people consider the primary function of business
computing. A transactional system records transactions. A transaction could be financial, such
as the movement of money between accounts in a banking system, or it might be part of a retail
system, tracking payments for goods and services from customers. Think of a transaction as a
small, discrete, unit of work.
Data Ingestion
Data ingestion is the process of capturing the raw data. This data could be taken from control
devices measuring environmental information such as temperature and pressure, point-of-sale
devices recording the items purchased by a customer in a supermarket, financial data recording
the movement of money between bank accounts, and weather data from weather stations. Some
,of this data might come from a separate OLTP system. To process and analyze this data, you
must first store the data in a repository of some sort. The repository could be a file store, a
document database, or even a relational database.
Data Transformation/Data Processing
Data processing is simply the conversion of raw data to meaningful information through a
process.
The raw data might not be in a format that is suitable for querying. The data might contain
anomalies that should be filtered out, or it may require transforming in some way. For example,
dates or addresses might need to be converted into a standard format. After data is ingested into a
data repository, you may want to do some cleaning operations and remove any questionable or
invalid data, or perform some aggregations such as calculating profit, margin, and other Key
Performance Indicators (KPIs). KPIs are how businesses are measured for growth and
performance.
Batch Processing
Buffering and processing the data in groups. The whole group is then processed at a future time.
Advantages of batch processing include:
Large volumes of data can be processed at a convenient time.
It can be scheduled to run at a time when computers or systems might otherwise be idle, such as
overnight, or during off-peak hours.
Disadvantages of batch processing include:
, The time delay between ingesting the data and getting the results.
All of a batch job's input data must be ready before a batch can be processed. This means data
must be carefully checked. Problems with data, errors, and program crashes that occur during
batch jobs bring the whole process to a halt. The input data must be carefully checked before the
job can be run again. Even minor data errors, such as typographical errors in dates, can prevent a
batch job from running.
Options include running U-SQL jobs in Azure Data Lake Analytics, using Hive, Pig, or custom
Map/Reduce jobs in an HDInsight Hadoop cluster, or using Java, Scala, or Python programs in
an HDInsight Spark cluster.
Stream data processing
Handles data in real time.
Stream processing is ideal for time-critical operations that require an instant real-time
response. For example, a system that monitors a building for smoke and heat needs to trigger
alarms and unlock doors to allow residents to escape immediately in the event of a fire.
After capturing real-time messages, the solution must process them by filtering, aggregating, and
otherwise preparing the data for analysis. The processed stream data is then written to an output
sink. Azure Stream Analytics provides a managed stream processing service based on
perpetually running SQL queries that operate on unbounded streams.
You can also use open source Apache streaming technologies like Storm and Spark Streaming in
an HDInsight cluster.
Database Administrators