DP-900: Microsoft Azure Data Fundamentals - 01 - Explore Core Data Concepts
1. What is Data?: is a collection of facts such as numbers, descriptions, and observations
used to record information.
2. What is Entity?: a source for which the data comes from.
3. What is Attribute?: a characteristic which defines the data.
4. What are the different data types?: structured semi-
structured
unstructured
5. What is Schema?: a concept or framework that organizes and interprets infor- mation.
6. What is Structured Data?: is data that adheres to fixed schema, meaning all of the data has
the same fields or properties.
7. What is Tabular?: representing data in one or more tables that consist of rows to represent
each instance of a data entity, and columns to represent attributes of the entity.
8. What is Semi-structured Data?: is information that has some structure, but which allows
for some variation between entity instances.
9. What is Unstructured Data?: Not defined, does not follow a specified format, and is typically
freeform text such as documents, images, audio, and video data.
10.What are the two common use categories for data storage?: File stores Databases
11.What is Comma-Separated Values?: a delimited text file which separates fields with commas
and rows are terminated by a carriage return or new line..
12.What is Tab-Separated Values?: a delimited text file which separates fields with spaces or tabs.
13.What is Fixed-Width Data?: a delimited text file which each field is allocated a fixed number
of characters.
14.What is the best use case for Delimited Text?: when structured data needs to be accessed by
a wide range of applications and services in a human-readable format.
15.What is JavaScript Object Notation (JSON)?: is a ubiquitous format in which a hierarchical
document schema is used to define data entities (objects) that have multiple attributes.
* Best used with structured and semi-structured data.
16.What is Extensible Markup Language (XML)?: is a human-readable data format that was
popular in the 1990s and 2000s.
17.What is Binary Large Object (BLOB)?:
18.What are Optimized File Formats?: these are formats which specialize in compression,
indexing, and efficient storage and processing.
1/
7
, DP-900: Microsoft Azure Data Fundamentals - 01 - Explore Core Data Concepts
* Common Types
Avro, ORC, and Parquet
19.What is Arvo?: is a row-based format and created by Apache. A good format for compressing
data and minimizing storage and network bandwidth requirements.
20.What is Optimized Row Columnar format (ORC)?: organizes data into columns rather
than rows and developed by HortonWorks with the purpose of optimizing read and write
operations in Apache Hive.
21.What is Parquet?: is a columnar data format that was created by Cloudera and Twitter. This
format supports very efficient compression and encoding schemes.
22.What is Database?: is used to define a central system in which data can be stored an
queried.
23.What is Relational Database?: is a database which is used to store and query structured data.
* Tables are managed and queried using Structured Query Language (SQL), which is based on
an ANSI standard making it similar accross multiple database systems.
24.What is Structure Query Language (SQL)?:
25.What is Non-relational Database?: is a database that does not apply a rela- tional schema to
the data and are often referred to as NoSQL database, even though some support a variant of
the SQL language.
26.What are the four common types of Non-relational database in use?: These are the four
commonly used types of a certain database:
1. Key-value
2. Document
3. Column family
4. Graph
27.What is Key-value Database?: a non-relational database type which each record consists of a
unique key and an associated value, which can be in any format.
28.What is Document Database?: a non-relational database type which is a specific form of
key-value database in which the value is a JSON document.
29. What is Column Family Database?: a non-relational database type which store tabular
data comprising rows and columns, but you can divide the columns into groups. Each holds a
set of columns that are logically related together.
30.What is Graph Database?: a non-relational database which store entities as nodes with
2/
7
1. What is Data?: is a collection of facts such as numbers, descriptions, and observations
used to record information.
2. What is Entity?: a source for which the data comes from.
3. What is Attribute?: a characteristic which defines the data.
4. What are the different data types?: structured semi-
structured
unstructured
5. What is Schema?: a concept or framework that organizes and interprets infor- mation.
6. What is Structured Data?: is data that adheres to fixed schema, meaning all of the data has
the same fields or properties.
7. What is Tabular?: representing data in one or more tables that consist of rows to represent
each instance of a data entity, and columns to represent attributes of the entity.
8. What is Semi-structured Data?: is information that has some structure, but which allows
for some variation between entity instances.
9. What is Unstructured Data?: Not defined, does not follow a specified format, and is typically
freeform text such as documents, images, audio, and video data.
10.What are the two common use categories for data storage?: File stores Databases
11.What is Comma-Separated Values?: a delimited text file which separates fields with commas
and rows are terminated by a carriage return or new line..
12.What is Tab-Separated Values?: a delimited text file which separates fields with spaces or tabs.
13.What is Fixed-Width Data?: a delimited text file which each field is allocated a fixed number
of characters.
14.What is the best use case for Delimited Text?: when structured data needs to be accessed by
a wide range of applications and services in a human-readable format.
15.What is JavaScript Object Notation (JSON)?: is a ubiquitous format in which a hierarchical
document schema is used to define data entities (objects) that have multiple attributes.
* Best used with structured and semi-structured data.
16.What is Extensible Markup Language (XML)?: is a human-readable data format that was
popular in the 1990s and 2000s.
17.What is Binary Large Object (BLOB)?:
18.What are Optimized File Formats?: these are formats which specialize in compression,
indexing, and efficient storage and processing.
1/
7
, DP-900: Microsoft Azure Data Fundamentals - 01 - Explore Core Data Concepts
* Common Types
Avro, ORC, and Parquet
19.What is Arvo?: is a row-based format and created by Apache. A good format for compressing
data and minimizing storage and network bandwidth requirements.
20.What is Optimized Row Columnar format (ORC)?: organizes data into columns rather
than rows and developed by HortonWorks with the purpose of optimizing read and write
operations in Apache Hive.
21.What is Parquet?: is a columnar data format that was created by Cloudera and Twitter. This
format supports very efficient compression and encoding schemes.
22.What is Database?: is used to define a central system in which data can be stored an
queried.
23.What is Relational Database?: is a database which is used to store and query structured data.
* Tables are managed and queried using Structured Query Language (SQL), which is based on
an ANSI standard making it similar accross multiple database systems.
24.What is Structure Query Language (SQL)?:
25.What is Non-relational Database?: is a database that does not apply a rela- tional schema to
the data and are often referred to as NoSQL database, even though some support a variant of
the SQL language.
26.What are the four common types of Non-relational database in use?: These are the four
commonly used types of a certain database:
1. Key-value
2. Document
3. Column family
4. Graph
27.What is Key-value Database?: a non-relational database type which each record consists of a
unique key and an associated value, which can be in any format.
28.What is Document Database?: a non-relational database type which is a specific form of
key-value database in which the value is a JSON document.
29. What is Column Family Database?: a non-relational database type which store tabular
data comprising rows and columns, but you can divide the columns into groups. Each holds a
set of columns that are logically related together.
30.What is Graph Database?: a non-relational database which store entities as nodes with
2/
7