100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Summary

Summary Data Engineering

Rating
-
Sold
3
Pages
167
Uploaded on
09-06-2020
Written in
2019/2020

Summary of 7 students, sample exams, summary of classes

Institution
Course











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Study
Course

Document information

Uploaded on
June 9, 2020
Number of pages
167
Written in
2019/2020
Type
Summary

Subjects

Content preview

Data Engineering
Samenvatting




Prof. Dr. Dieter Devlaminck

Assistenten Tom Vermeire, Yanou Ramon

Kevin Milis, Stiene Praet




1

,Table of Contents
I. Week 1 ....................................................................................................................... 7
1. Welcome to data engineering ............................................................................................. 7
1.1 Overview: ......................................................................................................................................... 7
1.2 Pop quiz ............................................................................................................................................ 7
1.3 Differentiating between data engineering and data science? ......................................................... 7
1.4 About this course............................................................................................................................ 10
1.5 Exam ............................................................................................................................................... 12
2. File formats .......................................................................................................................12
2.1 Overview: ....................................................................................................................................... 12
2.2 Formats........................................................................................................................................... 13
a) Human readable formats .................................................................................................................... 13
b) Not human readable and compressed file formats ............................................................................ 18
2.5 When to use what? ........................................................................................................................ 20

3 Python concepts ................................................................................................................21
3.1 Overview......................................................................................................................................... 21
3.2 Programming paradigms ................................................................................................................ 21
3.3 Python functions are first-class objects .......................................................................................... 21
3.4 Anonymous lambda function ......................................................................................................... 21
3.5 Passing functions to other functions .............................................................................................. 22
3.6 Sorting elements of an iterable ...................................................................................................... 22
3.6.1 Sorting (index for python starts at 0) ............................................................................................. 22
3.7 Partial functions.............................................................................................................................. 23
3.8 Collections ...................................................................................................................................... 23
3.8.1 defaultdict ...................................................................................................................................... 23
3.8.2 counter ........................................................................................................................................... 24
3.9 Map the elements of an iterable .................................................................................................... 24
3.10 Itertools: zip (je kan over meerdere lists tegelijkertijd itereren).................................................... 24
3.11 Itertools .......................................................................................................................................... 24
3.11.1 combinations (vindt alle mogelijke combinaties tussen de verschillende elementen in een list
(afh. van de parameters die je meegeeft) ................................................................................................... 24
3.11.2 permutations.............................................................................................................................. 25
3.12 One-line dot-product ...................................................................................................................... 25
3.13 Unicode .......................................................................................................................................... 25
3.14 Dates and times .............................................................................................................................. 27

II. Week 2 ..................................................................................................................... 28
1. Computer architecture and os ...........................................................................................28
1.1 Basic computer architecture and operating systems ......................................................28
1.1.1 At the end of this lecture ................................................................................................................ 28
1.1.2 Why do I need to know about computer architecture? ................................................................. 28
1.1.3 The main components of a computer ............................................................................................ 28
1.1.4 The clock frequency (speed of CPU) and the architecture of the CPU influence the number of
instructions that can be executed per second............................................................................................. 29
1.1.4.1 Parallelism will have a bigger impact on your data processing.................................................. 29
1.1.4.2 Modern CPU packages contain multiple cores improving parallelism ....................................... 29


2

, 1.1.4.3 The speed gap between CPU and memory ................................................................................ 30
1.1.4.4 Caching to optimize your data flow ........................................................................................... 30
1.1.5 Disks: high capacity, very slow storage devices.............................................................................. 31
1.1.6 Hard Disk Drive (HDD) .................................................................................................................... 31
1.1.6.1 HDD latency................................................................................................................................ 32
1.1.7 Solid State Disks (=transistors) ....................................................................................................... 32
1.1.8 Scaling: vertical vs horizontal ......................................................................................................... 33
1.2 Operating system level ..................................................................................................34
1.3 What is an operating system (OS) .................................................................................................. 34
1.3.1 The operating system hides some of the hardware complexity..................................................... 34
1.4 Process management: .................................................................................................................... 35
1.4.1 Process Control Block (PCB)............................................................................................................ 36
1.4.2 Threads (exists within the same process) and concurrency ........................................................... 36
1.4.3 Scheduling ...................................................................................................................................... 37
1.5 Memory management and virtual memory ................................................................................... 37
1.6 Inter-process communication ........................................................................................................ 39
1.7 Input/Output management ............................................................................................................ 39
1.8 File systems as a way to organize files on (secondary) memory .................................................... 39
1.9 Directory structure ......................................................................................................................... 39
1.10 Distributed File Systems (DFS) ........................................................................................................ 40
1.11 Virtualization: virtual machine ....................................................................................................... 40
1.12 Containers: lightweight virtualization ............................................................................................ 41

2. Regular expressions ...........................................................................................................41
2.1 Overview......................................................................................................................................... 41
2.2 Regular expression = regex ............................................................................................................. 41
2.2.1 Extracting email addresses of all people registered for Data Engineering ..................................... 42
2.2.2 Applications .................................................................................................................................... 43
2.2.3 Regular expressions are like a mini-language where certain characters have a special meaning . 43
2.2.4 Searching for a literal...................................................................................................................... 44
2.2.5 Match any character with . ............................................................................................................. 44
2.2.5.1 Match a set of characters........................................................................................................... 44
2.2.5.2 Match a range of characters ...................................................................................................... 45
2.2.5.3 Negate a set of characters ......................................................................................................... 45
2.2.6 Some predefined set of characters ................................................................................................ 45
2.2.7 Repeat a pattern one or zero times (make it optional) .................................................................. 46
2.2.8 Repeat a pattern one or more times .............................................................................................. 46
2.2.9 Repeat a pattern exactly n times .................................................................................................... 46
2.2.10 Repeating operators are greedy ................................................................................................ 46
2.2.11 Capturing and non-capturing groups ......................................................................................... 46
2.2.12 Lookahead .................................................................................................................................. 47
2.2.13 Lookbehind................................................................................................................................. 47
2.2.14 Regexes gone wrong .................................................................................................................. 48
2.2.15 Put an ad on all urls that contain the name of the telecom operator ......... Error! Bookmark not
defined.
2.2.16 Why [^t] ? It’s easy to miss edge cases, might not be perfect ................................................... 48
2.3 Concluding remarks ........................................................................................................................ 48
2.4 Extra................................................................................................................................................ 49
3. Computer networks ...........................................................................................................50
3.1 Based on computer networking: a top-down approach by Kurose and Ross................................. 50
3.1.1 The internet described in terms of its hardware components ....................................................... 50
3.1.2 Hosts and the client-server model ................................................................................................. 50



3

, 3.1.3 Protocol .......................................................................................................................................... 51
3.1.4 Packet Switching ............................................................................................................................ 51
3.1.5 The internet protocol stack ........................................................................................................... 52
3.1.6 Layered structure and protocols: the protocol stack ..................................................................... 52
3.2 Network applications...................................................................................................................... 53
3.2.1 HTTP – HyperText Transformation Protocol ................................................................................... 54
3.2.1.1 The HTTP request message ........................................................................................................ 54
3.2.1.2 The HTTP headers: ..................................................................................................................... 55
3.2.1.3 The general structure of the HTTP request message................................................................ 55
3.2.1.4 The HTTP response message ...................................................................................................... 55
3.2.1.5 The general structure of the HTTP response message ............................................................. 56
3.2.1.6 If using non-secure http (not https), one’s login and password are send in clear text. ............. 56
3.2.1.7 HTTP using the TCP transport layer protocol to send its messages ........................................... 56
3.2.1.8 Addressing processes ................................................................................................................. 57
3.2.2 DNS – Domain Name System.......................................................................................................... 57
3.2.2.1 Why distributed? ....................................................................................................................... 58
3.2.2.2 DNS client used by a web browser............................................................................................ 58

III. Week 3: ................................................................................................................. 59
1. Cloud services ....................................................................................................................59
I.1 Overview .......................................................................................................................59
I.2 Defining CS ....................................................................................................................59
I.2.1 What is the cloud? .......................................................................................................................... 59
I.2.2 Managing your cloud account ........................................................................................................ 59
I.2.3 The cloud stack ............................................................................................................................... 59
I.2.4 Examples of SaaS on AWS .............................................................................................................. 59
I.2.5 Advantages of cloud computing ..................................................................................................... 60
I.2.6 Geographic organization of AWS .................................................................................................... 60
I.2.7 Virtualization .................................................................................................................................. 60
I.2.8 Native and hosted virtualization .................................................................................................... 61

I.3 Core AWS services .........................................................................................................61
I.3.1 Virtual server hosting: AWS EC2 ..................................................................................................... 61
I.3.2 EC2 price models ............................................................................................................................ 61
I.3.3 VPS: Virtual Private Cloud............................................................................................................... 63
I.3.4 Identity Access Management (IAM): Permissions and roles .......................................................... 63
I.3.5 EBS: Elastic Block Storage ............................................................................................................... 64
I.3.6 Security groups (comparable to firewall) ....................................................................................... 64
I.3.7 Storage Infrastructure .................................................................................................................... 66
I.3.8 Database services ........................................................................................................................... 67
I.4 Cloud architecture example ...........................................................................................68
2. The Linux Operating System ..............................................................................................69
2.1 Unix the standard operating system (OS)....................................................................................... 69
2.2 Linux: a Unix-like OS ....................................................................................................................... 70
2.3 Linux command line instructions (file manipulation) ..................................................................... 74
2.4 JQ .................................................................................................................................................... 77

IV. Week 4: ................................................................................................................. 78
1. Algoritmic Complexity .......................................................................................................78
1.1 Motivation ...................................................................................................................................... 78



4
R111,26
Get access to the full document:

100% satisfaction guarantee
Immediately available after payment
Both online and in PDF
No strings attached

Get to know the seller
Seller avatar
ecgef

Get to know the seller

Seller avatar
ecgef Universiteit Antwerpen
Follow You need to be logged in order to follow users or courses
Sold
10
Member since
8 year
Number of followers
10
Documents
0
Last sold
2 year ago

0,0

0 reviews

5
0
4
0
3
0
2
0
1
0

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their exams and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can immediately select a different document that better matches what you need.

Pay how you prefer, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card or EFT and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions