Modelling Assignment
2: Data Pipelines
Albert Savill 3505901
,Table of Contents
List of Figures.........................................................................................................................................1
1 Introduction........................................................................................................................................2
1.1 Hadoop Ecosystem.......................................................................................................................2
3 Data Pipeline.......................................................................................................................................3
3.1 Luigi..............................................................................................................................................3
3.2 Airflow.........................................................................................................................................3
3.3 Comparison..................................................................................................................................4
4 Implementation..................................................................................................................................4
4.1 Ubuntu.........................................................................................................................................5
4.1.1 Luigi Installation....................................................................................................................5
4.1.2 Airflow Installation................................................................................................................7
4.2 Command Prompt......................................................................................................................10
4.2.1 Airflow Installation..............................................................................................................10
4.2.2 Luigi Installation..................................................................................................................11
4.3 Interface.....................................................................................................................................12
4.3.1 Luigi.....................................................................................................................................13
4.3.2 Apache-Airflow...................................................................................................................14
Conclusion...........................................................................................................................................15
References...........................................................................................................................................15
List of Figures
- Figure 1: Hadoop Ecosystem architecture design [7]..........................................................................3
- Figure 2: Ubuntu Luigi package installation.........................................................................................5
- Figure 3: Installing Luigi toml..............................................................................................................6
- Figure 4: Installing the bleeding edge code.........................................................................................6
- Figure 5: First steps in Airflow installation...........................................................................................7
- Figure 6: Install apache-airflow results................................................................................................7
- - Figure 7: Airflow initdb installation part 1.........................................................................................8
- Figure 8: Airflow initdb installation part 2..........................................................................................8
- Figure 9: Webserver command...........................................................................................................9
- Figure 10: Scheduler command...........................................................................................................9
- Figure 11: Installing airflow...............................................................................................................10
- Figure 12: Backfill and instance tasks................................................................................................11
1|Page
, - Figure 13: Installing Luigi after airflow...............................................................................................12
- Figure 14: Luigi task status [2]..........................................................................................................13
- Figure 15: Apache-Airflow user interface 1 [6]..................................................................................14
- Figure 16: Showing a DAG in Airflow [6]............................................................................................14
1 Introduction
1.1 Hadoop Ecosystem
Hadoop Ecosystem is an open-source workspace where its goal is to simplify big data projects and
problems. Hadoop itself is a series of different modules made into one huge framework which creates
an ecosystem of technology and software that is used to handle big data [7]. Main tasks that can be
seen featured in a Hadoop ecosystem is where it takes data and analyses, absorbs, stores, and
maintains that data.
2|Page