Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Essay

Introduction to Data Pipelines

Rating
-
Sold
-
Pages
16
Grade
A
Uploaded on
17-05-2021
Written in
2019/2020

This document is an introduction to data pipelines if anything, mostly focusing on Luigi and apache airflow with their similarities and differences. Please only buy this document if you are trying to learn either Luigi or Apache airflow. Or perhaps in a situation where you need to learn about the two.

Show more Read less
Institution
Course

Content preview

Statistical Analysis &
Modelling Assignment
2: Data Pipelines
Albert Savill 3505901

,Table of Contents
List of Figures.........................................................................................................................................1
1 Introduction........................................................................................................................................2
1.1 Hadoop Ecosystem.......................................................................................................................2
3 Data Pipeline.......................................................................................................................................3
3.1 Luigi..............................................................................................................................................3
3.2 Airflow.........................................................................................................................................3
3.3 Comparison..................................................................................................................................4
4 Implementation..................................................................................................................................4
4.1 Ubuntu.........................................................................................................................................5
4.1.1 Luigi Installation....................................................................................................................5
4.1.2 Airflow Installation................................................................................................................7
4.2 Command Prompt......................................................................................................................10
4.2.1 Airflow Installation..............................................................................................................10
4.2.2 Luigi Installation..................................................................................................................11
4.3 Interface.....................................................................................................................................12
4.3.1 Luigi.....................................................................................................................................13
4.3.2 Apache-Airflow...................................................................................................................14
Conclusion...........................................................................................................................................15
References...........................................................................................................................................15

List of Figures
- Figure 1: Hadoop Ecosystem architecture design [7]..........................................................................3

- Figure 2: Ubuntu Luigi package installation.........................................................................................5

- Figure 3: Installing Luigi toml..............................................................................................................6

- Figure 4: Installing the bleeding edge code.........................................................................................6

- Figure 5: First steps in Airflow installation...........................................................................................7

- Figure 6: Install apache-airflow results................................................................................................7

- - Figure 7: Airflow initdb installation part 1.........................................................................................8

- Figure 8: Airflow initdb installation part 2..........................................................................................8

- Figure 9: Webserver command...........................................................................................................9

- Figure 10: Scheduler command...........................................................................................................9

- Figure 11: Installing airflow...............................................................................................................10

- Figure 12: Backfill and instance tasks................................................................................................11

1|Page

, - Figure 13: Installing Luigi after airflow...............................................................................................12

- Figure 14: Luigi task status [2]..........................................................................................................13

- Figure 15: Apache-Airflow user interface 1 [6]..................................................................................14

- Figure 16: Showing a DAG in Airflow [6]............................................................................................14




1 Introduction
1.1 Hadoop Ecosystem
Hadoop Ecosystem is an open-source workspace where its goal is to simplify big data projects and
problems. Hadoop itself is a series of different modules made into one huge framework which creates
an ecosystem of technology and software that is used to handle big data [7]. Main tasks that can be
seen featured in a Hadoop ecosystem is where it takes data and analyses, absorbs, stores, and
maintains that data.




2|Page

Written for

Institution
Study
Course

Document information

Uploaded on
May 17, 2021
Number of pages
16
Written in
2019/2020
Type
ESSAY
Professor(s)
Unknown
Grade
A

Subjects

$17.48
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF

Get to know the seller
Seller avatar
savilla1110

Get to know the seller

Seller avatar
savilla1110 I hop
Follow You need to be logged in order to follow users or courses
Sold
1
Member since
4 year
Number of followers
1
Documents
0
Last sold
4 year ago

0.0

0 reviews

5
0
4
0
3
0
2
0
1
0

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions