Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4.2 TrustPilot
logo-home
Resume

Summary Data Engineering

Note
-
Vendu
3
Pages
167
Publié le
09-06-2020
Écrit en
2019/2020

Samenvatting van 7 studenten, voorbeeldexamen, summary of classes












Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Infos sur le Document

Publié le
9 juin 2020
Nombre de pages
167
Écrit en
2019/2020
Type
Resume

Aperçu du contenu

Data Engineering
Samenvatting




Prof. Dr. Dieter Devlaminck

Assistenten Tom Vermeire, Yanou Ramon

Kevin Milis, Stiene Praet




1

,Table of Contents
I. Week 1 ....................................................................................................................... 7
1. Welcome to data engineering ............................................................................................. 7
1.1 Overview: ......................................................................................................................................... 7
1.2 Pop quiz ............................................................................................................................................ 7
1.3 Differentiating between data engineering and data science? ......................................................... 7
1.4 About this course............................................................................................................................ 10
1.5 Exam ............................................................................................................................................... 12
2. File formats .......................................................................................................................12
2.1 Overview: ....................................................................................................................................... 12
2.2 Formats........................................................................................................................................... 13
a) Human readable formats .................................................................................................................... 13
b) Not human readable and compressed file formats ............................................................................ 18
2.5 When to use what? ........................................................................................................................ 20

3 Python concepts ................................................................................................................21
3.1 Overview......................................................................................................................................... 21
3.2 Programming paradigms ................................................................................................................ 21
3.3 Python functions are first-class objects .......................................................................................... 21
3.4 Anonymous lambda function ......................................................................................................... 21
3.5 Passing functions to other functions .............................................................................................. 22
3.6 Sorting elements of an iterable ...................................................................................................... 22
3.6.1 Sorting (index for python starts at 0) ............................................................................................. 22
3.7 Partial functions.............................................................................................................................. 23
3.8 Collections ...................................................................................................................................... 23
3.8.1 defaultdict ...................................................................................................................................... 23
3.8.2 counter ........................................................................................................................................... 24
3.9 Map the elements of an iterable .................................................................................................... 24
3.10 Itertools: zip (je kan over meerdere lists tegelijkertijd itereren).................................................... 24
3.11 Itertools .......................................................................................................................................... 24
3.11.1 combinations (vindt alle mogelijke combinaties tussen de verschillende elementen in een list
(afh. van de parameters die je meegeeft) ................................................................................................... 24
3.11.2 permutations.............................................................................................................................. 25
3.12 One-line dot-product ...................................................................................................................... 25
3.13 Unicode .......................................................................................................................................... 25
3.14 Dates and times .............................................................................................................................. 27

II. Week 2 ..................................................................................................................... 28
1. Computer architecture and os ...........................................................................................28
1.1 Basic computer architecture and operating systems ......................................................28
1.1.1 At the end of this lecture ................................................................................................................ 28
1.1.2 Why do I need to know about computer architecture? ................................................................. 28
1.1.3 The main components of a computer ............................................................................................ 28
1.1.4 The clock frequency (speed of CPU) and the architecture of the CPU influence the number of
instructions that can be executed per second............................................................................................. 29
1.1.4.1 Parallelism will have a bigger impact on your data processing.................................................. 29
1.1.4.2 Modern CPU packages contain multiple cores improving parallelism ....................................... 29


2

, 1.1.4.3 The speed gap between CPU and memory ................................................................................ 30
1.1.4.4 Caching to optimize your data flow ........................................................................................... 30
1.1.5 Disks: high capacity, very slow storage devices.............................................................................. 31
1.1.6 Hard Disk Drive (HDD) .................................................................................................................... 31
1.1.6.1 HDD latency................................................................................................................................ 32
1.1.7 Solid State Disks (=transistors) ....................................................................................................... 32
1.1.8 Scaling: vertical vs horizontal ......................................................................................................... 33
1.2 Operating system level ..................................................................................................34
1.3 What is an operating system (OS) .................................................................................................. 34
1.3.1 The operating system hides some of the hardware complexity..................................................... 34
1.4 Process management: .................................................................................................................... 35
1.4.1 Process Control Block (PCB)............................................................................................................ 36
1.4.2 Threads (exists within the same process) and concurrency ........................................................... 36
1.4.3 Scheduling ...................................................................................................................................... 37
1.5 Memory management and virtual memory ................................................................................... 37
1.6 Inter-process communication ........................................................................................................ 39
1.7 Input/Output management ............................................................................................................ 39
1.8 File systems as a way to organize files on (secondary) memory .................................................... 39
1.9 Directory structure ......................................................................................................................... 39
1.10 Distributed File Systems (DFS) ........................................................................................................ 40
1.11 Virtualization: virtual machine ....................................................................................................... 40
1.12 Containers: lightweight virtualization ............................................................................................ 41

2. Regular expressions ...........................................................................................................41
2.1 Overview......................................................................................................................................... 41
2.2 Regular expression = regex ............................................................................................................. 41
2.2.1 Extracting email addresses of all people registered for Data Engineering ..................................... 42
2.2.2 Applications .................................................................................................................................... 43
2.2.3 Regular expressions are like a mini-language where certain characters have a special meaning . 43
2.2.4 Searching for a literal...................................................................................................................... 44
2.2.5 Match any character with . ............................................................................................................. 44
2.2.5.1 Match a set of characters........................................................................................................... 44
2.2.5.2 Match a range of characters ...................................................................................................... 45
2.2.5.3 Negate a set of characters ......................................................................................................... 45
2.2.6 Some predefined set of characters ................................................................................................ 45
2.2.7 Repeat a pattern one or zero times (make it optional) .................................................................. 46
2.2.8 Repeat a pattern one or more times .............................................................................................. 46
2.2.9 Repeat a pattern exactly n times .................................................................................................... 46
2.2.10 Repeating operators are greedy ................................................................................................ 46
2.2.11 Capturing and non-capturing groups ......................................................................................... 46
2.2.12 Lookahead .................................................................................................................................. 47
2.2.13 Lookbehind................................................................................................................................. 47
2.2.14 Regexes gone wrong .................................................................................................................. 48
2.2.15 Put an ad on all urls that contain the name of the telecom operator ......... Error! Bookmark not
defined.
2.2.16 Why [^t] ? It’s easy to miss edge cases, might not be perfect ................................................... 48
2.3 Concluding remarks ........................................................................................................................ 48
2.4 Extra................................................................................................................................................ 49
3. Computer networks ...........................................................................................................50
3.1 Based on computer networking: a top-down approach by Kurose and Ross................................. 50
3.1.1 The internet described in terms of its hardware components ....................................................... 50
3.1.2 Hosts and the client-server model ................................................................................................. 50



3

, 3.1.3 Protocol .......................................................................................................................................... 51
3.1.4 Packet Switching ............................................................................................................................ 51
3.1.5 The internet protocol stack ........................................................................................................... 52
3.1.6 Layered structure and protocols: the protocol stack ..................................................................... 52
3.2 Network applications...................................................................................................................... 53
3.2.1 HTTP – HyperText Transformation Protocol ................................................................................... 54
3.2.1.1 The HTTP request message ........................................................................................................ 54
3.2.1.2 The HTTP headers: ..................................................................................................................... 55
3.2.1.3 The general structure of the HTTP request message................................................................ 55
3.2.1.4 The HTTP response message ...................................................................................................... 55
3.2.1.5 The general structure of the HTTP response message ............................................................. 56
3.2.1.6 If using non-secure http (not https), one’s login and password are send in clear text. ............. 56
3.2.1.7 HTTP using the TCP transport layer protocol to send its messages ........................................... 56
3.2.1.8 Addressing processes ................................................................................................................. 57
3.2.2 DNS – Domain Name System.......................................................................................................... 57
3.2.2.1 Why distributed? ....................................................................................................................... 58
3.2.2.2 DNS client used by a web browser............................................................................................ 58

III. Week 3: ................................................................................................................. 59
1. Cloud services ....................................................................................................................59
I.1 Overview .......................................................................................................................59
I.2 Defining CS ....................................................................................................................59
I.2.1 What is the cloud? .......................................................................................................................... 59
I.2.2 Managing your cloud account ........................................................................................................ 59
I.2.3 The cloud stack ............................................................................................................................... 59
I.2.4 Examples of SaaS on AWS .............................................................................................................. 59
I.2.5 Advantages of cloud computing ..................................................................................................... 60
I.2.6 Geographic organization of AWS .................................................................................................... 60
I.2.7 Virtualization .................................................................................................................................. 60
I.2.8 Native and hosted virtualization .................................................................................................... 61

I.3 Core AWS services .........................................................................................................61
I.3.1 Virtual server hosting: AWS EC2 ..................................................................................................... 61
I.3.2 EC2 price models ............................................................................................................................ 61
I.3.3 VPS: Virtual Private Cloud............................................................................................................... 63
I.3.4 Identity Access Management (IAM): Permissions and roles .......................................................... 63
I.3.5 EBS: Elastic Block Storage ............................................................................................................... 64
I.3.6 Security groups (comparable to firewall) ....................................................................................... 64
I.3.7 Storage Infrastructure .................................................................................................................... 66
I.3.8 Database services ........................................................................................................................... 67
I.4 Cloud architecture example ...........................................................................................68
2. The Linux Operating System ..............................................................................................69
2.1 Unix the standard operating system (OS)....................................................................................... 69
2.2 Linux: a Unix-like OS ....................................................................................................................... 70
2.3 Linux command line instructions (file manipulation) ..................................................................... 74
2.4 JQ .................................................................................................................................................... 77

IV. Week 4: ................................................................................................................. 78
1. Algoritmic Complexity .......................................................................................................78
1.1 Motivation ...................................................................................................................................... 78



4
€5,48
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur
Seller avatar
ecgef

Faites connaissance avec le vendeur

Seller avatar
ecgef Universiteit Antwerpen
Voir profil
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
10
Membre depuis
8 année
Nombre de followers
10
Documents
0
Dernière vente
2 année de cela

0,0

0 revues

5
0
4
0
3
0
2
0
1
0

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions