Modern Data Ecosystem and role of
Data Engineering Exam
Value of data is based on 2 factors: - ANS-Accuracy of data
Accessibility of data when needed
Data Engineering Scope - ANS-Data
Data Repositories
Data Pipelines
Data Integration Platforms
Big Data
Data Platforms
Data Stores
ETL Processes
ELT Process
Data Security
Data Privacy
Governance and Compliance
Modern Data Ecosystem - ANS-Data Sources.
Data Integrated from disparate sources.
Analysis and skills to generate insights.
Stakeholders to collaborate & act on insights.
Tools, applications and infrastructure to store, process and disseminate data.
Data Sources - ANS-Types:
Structured
Unstructured
Factors:
Acquiring the data.
Pulling in from different data formats, sources, interfaces.
Challenges:
Reliability
Security
and integrity of data
Enterprise Data Environment - ANS-Data Sources.
,Raw data needs to be organised, cleaned, optimised for access to comply with
compliance's and standards enforced in the organisation before being made available to
Users.
To ensure standardisation not of Master data, data should adhere to the organisations
master data tables.
Users - ANS-Business stakeholders
Applications
Programmers/Analysts
Data Science use cases
User Challenges - ANS-Interfaces
APIs
Applications
Emerging Technologies - ANS-Cloud
Machine Learning
Big Data
Data Professionals - ANS-Data Engineers
Data Analysts
Data Scientists
business Analysts
Business Intelligence Analysts
Data Engineer - ANS-Develop data architectures.
Extract, integrate and organise data from disparate sources.
Clean, transform and prepare data.
Design, store and manage data in repositories.
Enable data to be accessible to users in a variety of required formats.
Data Analysts - ANS-Inspect and clean data for deriving insights.
Identify correlations, find patterns.
Apply statistical methods to analyse and mine data.
Visualise data to interpret and present the findings of data analysis.
Data Scientists - ANS-Analyse data for actionable insights.
Build Machine Learning/Predictive models.
Business Analysts/ BI Analysts - ANS-Leverage the work of Data Analysts and Data
Scientists to look at potential implications for their business and recommend actions.
BI analyst focus on the market forces and external influences that shape their business.
Summary of Data Roles - ANS-Data Engineer: converts raw data to usable data
,Data Analytics: use this data to generate insights.
Data Scientists: use Data Analytics and Data Engineering to predict the future.
Business Analysts and Business Intelligence Analysts: use insights and predictions to
drive decisions that benefits and grow their business.
Data Engineering - ANS-Involves:
Collecting source data
Processing data
Storing data
Making data available to users securely
Modern Evolution Data Engineering - ANS-Traditionally focused on:
Database Management
ETL Pipelines
Data Visualisation
Current requirements include an understanding of:
Distributed Computing
DevOps
Implementation of Machine Learning Models
Trivia: Goal of Data Engineering - ANS-Make quality data available for fact finding and
business decision-making.
Trivia: Data extracted from disparate sources can be stored in: - ANS-Databases
Data Warehouses
Data Lakes
Any other type of data repository
Responsibilities of a Data Engineer - ANS-Provide analytics ready data to consumers
Features of Analytics Ready Data - ANS-Accurate
Reliable
Complies to Regulations
Accessible to Consumers when needed
What do Data Engineers do? - ANS-Extract, organise and integrate data from disparate
sources.
Prepare data for analysis by transforming and cleansing it.
Design and manage data pipelines that encompass the journey of data from source to
destination systems.
, Setup and manage the infrastructure required for the ingestion, processing and storage
of data including Data Platforms, Data Stores, Distributed Systems, Data Repositories
Data Engineer Technical Tools - ANS-Operating Systems, Systems Utilities &
Commands.
Virtual Machines
Networking
Application Services e.g Load Balancing
Cloud Based Services
Databases and Data Warehouses
Databases and Data Warehouse Examples - ANS-RDBMS:
IBM DB2
MySQL
Oracle Database
PostgreSQL
NoSQL:
Redis
MongoDB
Cassandra
Neo4J
Data Warehouses:
Oracle Exadata
IBM DB2 Watehouse on Cloud
IBM Netezza Performance Server
Amazon RedShift
Data Pipelines - ANS-Apache Bean
Airflow
DataFlow
ETL Tools - ANS-IBM Infosphere
AWS Glue
Improvado
Languages - ANS-Query languages:
SQL for RDBMS
SQL like query languages for No-SQL databases
Programming languages:
Python
R
Java
Data Engineering Exam
Value of data is based on 2 factors: - ANS-Accuracy of data
Accessibility of data when needed
Data Engineering Scope - ANS-Data
Data Repositories
Data Pipelines
Data Integration Platforms
Big Data
Data Platforms
Data Stores
ETL Processes
ELT Process
Data Security
Data Privacy
Governance and Compliance
Modern Data Ecosystem - ANS-Data Sources.
Data Integrated from disparate sources.
Analysis and skills to generate insights.
Stakeholders to collaborate & act on insights.
Tools, applications and infrastructure to store, process and disseminate data.
Data Sources - ANS-Types:
Structured
Unstructured
Factors:
Acquiring the data.
Pulling in from different data formats, sources, interfaces.
Challenges:
Reliability
Security
and integrity of data
Enterprise Data Environment - ANS-Data Sources.
,Raw data needs to be organised, cleaned, optimised for access to comply with
compliance's and standards enforced in the organisation before being made available to
Users.
To ensure standardisation not of Master data, data should adhere to the organisations
master data tables.
Users - ANS-Business stakeholders
Applications
Programmers/Analysts
Data Science use cases
User Challenges - ANS-Interfaces
APIs
Applications
Emerging Technologies - ANS-Cloud
Machine Learning
Big Data
Data Professionals - ANS-Data Engineers
Data Analysts
Data Scientists
business Analysts
Business Intelligence Analysts
Data Engineer - ANS-Develop data architectures.
Extract, integrate and organise data from disparate sources.
Clean, transform and prepare data.
Design, store and manage data in repositories.
Enable data to be accessible to users in a variety of required formats.
Data Analysts - ANS-Inspect and clean data for deriving insights.
Identify correlations, find patterns.
Apply statistical methods to analyse and mine data.
Visualise data to interpret and present the findings of data analysis.
Data Scientists - ANS-Analyse data for actionable insights.
Build Machine Learning/Predictive models.
Business Analysts/ BI Analysts - ANS-Leverage the work of Data Analysts and Data
Scientists to look at potential implications for their business and recommend actions.
BI analyst focus on the market forces and external influences that shape their business.
Summary of Data Roles - ANS-Data Engineer: converts raw data to usable data
,Data Analytics: use this data to generate insights.
Data Scientists: use Data Analytics and Data Engineering to predict the future.
Business Analysts and Business Intelligence Analysts: use insights and predictions to
drive decisions that benefits and grow their business.
Data Engineering - ANS-Involves:
Collecting source data
Processing data
Storing data
Making data available to users securely
Modern Evolution Data Engineering - ANS-Traditionally focused on:
Database Management
ETL Pipelines
Data Visualisation
Current requirements include an understanding of:
Distributed Computing
DevOps
Implementation of Machine Learning Models
Trivia: Goal of Data Engineering - ANS-Make quality data available for fact finding and
business decision-making.
Trivia: Data extracted from disparate sources can be stored in: - ANS-Databases
Data Warehouses
Data Lakes
Any other type of data repository
Responsibilities of a Data Engineer - ANS-Provide analytics ready data to consumers
Features of Analytics Ready Data - ANS-Accurate
Reliable
Complies to Regulations
Accessible to Consumers when needed
What do Data Engineers do? - ANS-Extract, organise and integrate data from disparate
sources.
Prepare data for analysis by transforming and cleansing it.
Design and manage data pipelines that encompass the journey of data from source to
destination systems.
, Setup and manage the infrastructure required for the ingestion, processing and storage
of data including Data Platforms, Data Stores, Distributed Systems, Data Repositories
Data Engineer Technical Tools - ANS-Operating Systems, Systems Utilities &
Commands.
Virtual Machines
Networking
Application Services e.g Load Balancing
Cloud Based Services
Databases and Data Warehouses
Databases and Data Warehouse Examples - ANS-RDBMS:
IBM DB2
MySQL
Oracle Database
PostgreSQL
NoSQL:
Redis
MongoDB
Cassandra
Neo4J
Data Warehouses:
Oracle Exadata
IBM DB2 Watehouse on Cloud
IBM Netezza Performance Server
Amazon RedShift
Data Pipelines - ANS-Apache Bean
Airflow
DataFlow
ETL Tools - ANS-IBM Infosphere
AWS Glue
Improvado
Languages - ANS-Query languages:
SQL for RDBMS
SQL like query languages for No-SQL databases
Programming languages:
Python
R
Java