100% tevredenheidsgarantie Direct beschikbaar na je betaling Lees online óf als PDF Geen vaste maandelijkse kosten 4.2 TrustPilot
logo-home
Samenvatting

Summary Data Engineering

Beoordeling
-
Verkocht
2
Pagina's
29
Geüpload op
27-10-2022
Geschreven in
2021/2022

Deze samenvatting bevat de geziene leerstof bij de gegeven voorbeeldvragen.











Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Documentinformatie

Geüpload op
27 oktober 2022
Aantal pagina's
29
Geschreven in
2021/2022
Type
Samenvatting

Voorbeeld van de inhoud

EXAM QUESTION DATA ENGINEERING
Table of Contents
WEEK 1: INTRODUCTION, FILE FORMATS, PYTHON FOR DATA ENGINEERING ............................................... 4
1. What are the differences between a data engineer and a data scientist? What are common technical
and functional requirements or tasks ensured by a data engineer? 4
2. How are integer, decimal numbers, text and images stored in a computer? Give an example of binary
encoding for each type. 4
3. We saw three different data models for representing data? Name and provide a short summary of
each data model. 5
4. What are the strengths and weaknesses of the relation model versus the document-oriented model?
Which model would you prefer? 6
5. Which human-readable file formats are used for storing and communicating data? Provide short
examples in JSON and XML for storing social network profile of a user. 6
6. Protocol Buffers and Apache Parquet are two binary formats used for storing or communicating data.
Explain how these formats work? What are the advantages of using a binary format over a text format,
such as CSV? 7
7. List, dictionary and set are the three most commonly used collections in Python. Explain the main
properties and operations of these data structures. 7
8. Suppose you want to store a student object, with a name and age and master in Python. Which data
structure would you use? 8
WEEK 2: COMPUTER ARCHITECTURE, OS, NETWORKS AND REGULAR EXPRESSIONS .................................... 8
9. How does a Central Processing Unit and Graphical Processing work? Explain what an instruction,
SIMD instruction, cycle and clock frequency are. What is Moore’s law and what can we expect from
processors in the future. 8
10. The operating system has several responsibilities. Explain what the process manager does, multi-
tasking and the process queue. 9
11. The operating system provides two techniques to run programs in parallel: processes and threads.
Explain both techniques. What are the advantages and disadvantages of each? 9
12. What is the internet protocol stack? Which steps occur when an email it transmitted from my iPhone
and received on a Windows desktop. Explain the difference between the application, transport, internet
and link layer. 10
13. What is an API? Give an example in the e-commerce domain. 11
14. What happens when you enter a URL in your browser? Bijvraag: explain DNS. 11
15. Explain the Hypertext Transfer Protocol. What does each part mean in the URL
https://www.sporza.be/sport/voetbal.html? How do you use HTTP to get the content of this webpage?
Which information is shared between each client and server using the HTTP headers? 11
16. Write a regular expression that matches all strings of format where
FIRSTNAME and LASTNAME can contain lower-case and upper-case alphabetic characters and UNI is
either uantwerpen, ugent, kuleuven or vub. 12
WEEK 3: CLOUD SERVICES AND LINUX ...................................................................................................... 12
17. What are the advantages and disadvantages of using cloud services versus running services on
premise? Discuss the different types of services provided by cloud vendors (i.e. the cloud stack) and give
an example of each. 12
18. What is the Hadoop File System? What happens when a disk, that is part of large clusters of disk
drives managed with HDFS, crashes and is beyond repair. 13

, 19. Firewall (security group), encryption, public-private key pair, roles and responsibilities and the virtual
private cloud are all techniques used to secure cloud instances. Explain each technique. 13
20. A system administrator installs and manages multiple servers or infrastructure. Explain briefly which
tasks are required by a system administrator for setting up a new software service and ensuring smooth
operations. Give an example of a concrete installation of a website, database or login service. 14
21. What does the following Linux command do? Explain the following output and give an e xample of 5
other linux commands. 14
22. What does cat students.csv | egrep '[a-zA-Z\.]+@uantwerpen\.be' do? Give an example of 5 other
linux commands. 15
WEEK 4: ALGORITHMS AND DATA STRUCTURES........................................................................................ 15
23. Give an example in Python of an algorithm with a constant, linear, quadratic time complexity. 15
24. Given the following algorithm. How many additions are required for n=3 and n=5? What is the worst -
case time complexity? Why is this important? 15
25. Provide the pseudocode of the quicksort algorithm. Explain why the time complexity is O(n.log(n)).
16
26. Assume you want to search for a user name in an large list contain 10 9 users. Which algorithm would
you use, and how much time would it take to finish computation? 16
27. Assume you have a large collection of user names and you want to query if the name is present in this
collection. What is the complexity of using an ArrayList and a Hash table? Which data structure would you
prefer? 16
28. Assume you have a large collection of user names and you want to query if the name is present in this
collection. What is the complexity of using an ArrayList and a Binary tree? Which data structure would you
prefer? 16
29. What is an Abstract Data Type (ADT)? Give examples of an ADT and two implementations. 17
WEEK 5: RELATIONAL DATABASES ............................................................................................................ 17
30. A key principle of databases is to store data non-redundant and consistent. What does this mean?
17
31. A key principle of databases is to query data efficiently and allow concurrent access to the data?
What does this mean? 18
32. Transactions in a relational database have ACID properties: Atomic, Consistent, Isolated and Durable.
Give an example why each property is important and provide example, for instance using the bank
transfer use-case. 18
33. SQL is a declarative language, but this is not problematic since a RDBMS by default defines indexes on
primary and foreign keys and has a query optimizer. Explain how an index and query optimizer work.
18
34. What is a primary key and what is a foreign key? What does referential integrity mean? What can
happen if you delete a row from a table in this context? 19
35. What are the advantages and disadvantages of normalisation versus de -normalisation? 19
36. Give an example of an SQL schema for storing students, courses and student grades. 20
37. The users table contains the student name, enrollment number and the master they enrolled in.
Write an SQL query that returns the total number of students for each master. 20
38. The employee table contains the employee id, first and last name; and a table department with
columns for the name of the department and the employee id. Write an SQL query that returns the first
and last name for each employee in each department. 20
39. The suppliers table contains the name of each supplier and the city. The shipments table contains the
code of product, name of supplier and date of shipment. Write an SQL query that returns all products
shipped from Antwerp. 20

, 40. The suppliers table contains the name of each supplier and the city. The shipments table contains the
code of product, name of supplier and date of shipment. Write an SQL query that returns the total
amount of products shipped from each city. 20
WEEK 6: DATA WAREHOUSING AND NOSQL ............................................................................................. 21
41. Give an example of a data cube with 3 dimensions. 21
42. Typical Online Analytical Operations are pivoting, slicing, drill-down and roll-up. Give an example of
each type of operations and explain.21
43. The following table contains details on shipments. Give two examples of analytic queries where you
investigate the number of shipments using different dimensions. 21
44. What are the four main categories of NoSQL databases? Briefly explain what type of data each
category of database stores. 21
45. What is a key-value store? For which application is it used? 22
46. What are the advantages and disadvantages of sharding and re plication? Explain why you would
combine both techniques. 22
WEEK 7: VISUALISATION ........................................................................................................................... 23
47. Explain perceptual accuracy and why it’s important. What type of visualisation would you prefer to
plot where both dependent and independent variables are quantitative? What type of visualization would
you prefer when both dependent and independent are categorical (or nominal)? 23
48. Assume you have a dataset with one target attribute and 10 continuous features. Explain and name
three techniques you would employ to visualise this dataset? 23
WEEK 8: PARALLEL AND DISTRIBUTED COMPUTING/MAPREDUCE............................................................. 24
49. Give an example of data- and task- parallelism. Give an example of both types. 24
50. How would you add a large set of numbers in parallel? Or perform an inner product on two large
vectors in parallel? Do you expect a speedup linear in the number of processors for these cases? Motivate
your answer. 24
51. What are the main properties of a distributed system? How is it different from parallel processing on
a single system? What happens when a node fails? 24
52. Explain the differences between services, batch-processing and streaming-processing. Give an
example of each type of system. 25
53. Map-Reduce data processing tasks can be decomposed into multiple two-step stages, the first step
being Map and the second step being Reduce. Explain and give an example of the evaluation, or
distributed execution, of a program for counting words. 25
54. The following Spark code computes the total amount received for each credit account. Explain how
this program is evaluated and give an example of the computation for the example input file assuming
two nodes. 26
Bijvraag: leg DAG uit met zijn nadelen 27
Week 9: Recommender systems............................................................................................................... 27
55. The primary goal of a recommender systems is predict which item a user is most likely interested in.
What are three secondary goals? Explain. 27
56. What are the main components of a content-based recommender system? 27
57. How can you compute the similarity between two products? Give an example of content -based and
collaborative-filtering based cosine similarity? 28
53. How does item-based collaborative filtering works? 28
58. Assume you are responsible for creating a website that includes recommendations. Which software
(i.e. micro-services, load balancer), hardware (i.e. servers) and data stores (i.e. json, relational database,
csv) would you use. Provide a high-level diagram and connection between different components. 28
Written exam with oral dissemination ...................................................................................................... 29

Maak kennis met de verkoper

Seller avatar
De reputatie van een verkoper is gebaseerd op het aantal documenten dat iemand tegen betaling verkocht heeft en de beoordelingen die voor die items ontvangen zijn. Er zijn drie niveau’s te onderscheiden: brons, zilver en goud. Hoe beter de reputatie, hoe meer de kwaliteit van zijn of haar werk te vertrouwen is.
audreyvanlierde Universiteit Antwerpen
Bekijk profiel
Volgen Je moet ingelogd zijn om studenten of vakken te kunnen volgen
Verkocht
150
Lid sinds
7 jaar
Aantal volgers
118
Documenten
6
Laatst verkocht
2 weken geleden

4,0

24 beoordelingen

5
9
4
8
3
6
2
1
1
0

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via Bancontact, iDeal of creditcard en download je PDF-document meteen.

Student with book image

“Gekocht, gedownload en geslaagd. Zo eenvoudig kan het zijn.”

Alisha Student

Veelgestelde vragen