Geschreven door studenten die geslaagd zijn Direct beschikbaar na je betaling Online lezen of als PDF Verkeerd document? Gratis ruilen 4,6 TrustPilot
logo-home
Essay

Introduction to Data Pipelines

Beoordeling
-
Verkocht
-
Pagina's
16
Cijfer
A
Geüpload op
17-05-2021
Geschreven in
2019/2020

This document is an introduction to data pipelines if anything, mostly focusing on Luigi and apache airflow with their similarities and differences. Please only buy this document if you are trying to learn either Luigi or Apache airflow. Or perhaps in a situation where you need to learn about the two.

Meer zien Lees minder
Instelling
Vak

Voorbeeld van de inhoud

Statistical Analysis &
Modelling Assignment
2: Data Pipelines
Albert Savill 3505901

,Table of Contents
List of Figures.........................................................................................................................................1
1 Introduction........................................................................................................................................2
1.1 Hadoop Ecosystem.......................................................................................................................2
3 Data Pipeline.......................................................................................................................................3
3.1 Luigi..............................................................................................................................................3
3.2 Airflow.........................................................................................................................................3
3.3 Comparison..................................................................................................................................4
4 Implementation..................................................................................................................................4
4.1 Ubuntu.........................................................................................................................................5
4.1.1 Luigi Installation....................................................................................................................5
4.1.2 Airflow Installation................................................................................................................7
4.2 Command Prompt......................................................................................................................10
4.2.1 Airflow Installation..............................................................................................................10
4.2.2 Luigi Installation..................................................................................................................11
4.3 Interface.....................................................................................................................................12
4.3.1 Luigi.....................................................................................................................................13
4.3.2 Apache-Airflow...................................................................................................................14
Conclusion...........................................................................................................................................15
References...........................................................................................................................................15

List of Figures
- Figure 1: Hadoop Ecosystem architecture design [7]..........................................................................3

- Figure 2: Ubuntu Luigi package installation.........................................................................................5

- Figure 3: Installing Luigi toml..............................................................................................................6

- Figure 4: Installing the bleeding edge code.........................................................................................6

- Figure 5: First steps in Airflow installation...........................................................................................7

- Figure 6: Install apache-airflow results................................................................................................7

- - Figure 7: Airflow initdb installation part 1.........................................................................................8

- Figure 8: Airflow initdb installation part 2..........................................................................................8

- Figure 9: Webserver command...........................................................................................................9

- Figure 10: Scheduler command...........................................................................................................9

- Figure 11: Installing airflow...............................................................................................................10

- Figure 12: Backfill and instance tasks................................................................................................11

1|Page

, - Figure 13: Installing Luigi after airflow...............................................................................................12

- Figure 14: Luigi task status [2]..........................................................................................................13

- Figure 15: Apache-Airflow user interface 1 [6]..................................................................................14

- Figure 16: Showing a DAG in Airflow [6]............................................................................................14




1 Introduction
1.1 Hadoop Ecosystem
Hadoop Ecosystem is an open-source workspace where its goal is to simplify big data projects and
problems. Hadoop itself is a series of different modules made into one huge framework which creates
an ecosystem of technology and software that is used to handle big data [7]. Main tasks that can be
seen featured in a Hadoop ecosystem is where it takes data and analyses, absorbs, stores, and
maintains that data.




2|Page

Geschreven voor

Instelling
Studie
Vak

Documentinformatie

Geüpload op
17 mei 2021
Aantal pagina's
16
Geschreven in
2019/2020
Type
ESSAY
Docent(en)
Onbekend
Cijfer
A

Onderwerpen

$17.48
Krijg toegang tot het volledige document:

Verkeerd document? Gratis ruilen Binnen 14 dagen na aankoop en voor het downloaden kan je een ander document kiezen. Je kan het bedrag gewoon opnieuw besteden.
Geschreven door studenten die geslaagd zijn
Direct beschikbaar na je betaling
Online lezen of als PDF

Maak kennis met de verkoper
Seller avatar
savilla1110

Maak kennis met de verkoper

Seller avatar
savilla1110 I hop
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
1
Lid sinds
4 jaar
Aantal volgers
1
Documenten
0
Laatst verkocht
4 jaar geleden

0.0

0 beoordelingen

5
0
4
0
3
0
2
0
1
0

Populaire documenten

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen