Samenvatting

Summary Digitization & Big Data Analytics

Beoordeling

Verkocht

Pagina's

Geüpload op

25-05-2021

Geschreven in

2020/2021

This is a summary of Digitization & Big Data Analytics. It contains all relevant content for the exam.

Instelling

Vak

Oeps! We kunnen je document nu niet laden. Probeer het nog eens of neem contact op met support.

Meld schending auteursrecht

Geschreven voor

Instelling: Tilburg University (UVT)
Studie: Bedrijfseconomie
Vak: Digitization & Big Data Analytics (30K217B6)

Alle documenten voor dit vak (2)

Documentinformatie

Geüpload op: 25 mei 2021
Aantal pagina's: 51
Geschreven in: 2020/2021
Type: Samenvatting

Onderwerpen

advancing society
digitization
big data analytics
digitization amp big data analytics
digitization and big data analytics
uvt
tilburg university

Voorbeeld van de inhoud

Advancing society summary

Lecture 1
No important data

Lecture 2
History
In 1902 the Antikythere mechanism (or astrolabes) was discovered. This is an ancient Griek
analog computer used to predict astronomical positions and eclipses decades in advance and it
was used to track the cycle of the Olympic games.

The turing machine was made by Alan turing who is considered the father of theoretical
computer science and artificial intelligence. He influenced the development of theoretical
computer science and formalized the concept of algorithm and computation with the turing
machine, which is a model of a general-purpose computer. That is a computer that is able to
perform most common computing tasks.

The ENIAC was the first electronic general-purpose computer. It was Turing-complete and
able to solve a large class of numerical problems through programming.

Server - Client

-> Client requests content or service from the central server who redirects this to other servers
where the content or service is allocated. This is than brought back to the client.

GFS and MapReduce
GFS – Google File System
• Proprietary distributed file system developed by Google to provide efficient, reliable access
to data using large clusters of commodity hardware (easily obtainable hardware).
MapReduce
• Programming model and associated implementation for processing and generating large
datasets with a parallel distributed algorithm on a cluster. First you have the map task, where
the data is read and processed to produce key-value pairs as intermediate outputs. The output
of a mapper is input of the reducer. The reducer receives the key-value pair from multilple

,map jobs. Then, the reduces aggregates those intermediata data tuples ( key-value pair) into a
smaller set of tuples which is the final ouput.
Example: Imaging your dataset as a Lego model, broken down into its pieces, and then
distributed to different locations, where there are also other pieces of Lego models that you
don’t care about. MapReduce provides a framework for distributed algorithms that enables
you to write code for clusters which can put together the pieces you need for your analysis, by
finding the pieces in the remote locations (Map) and bringing them together back as a Lego
model (Reduce).

Appache hadoop
Intro
-Open source project – Solution for Big Data
• Deals with complexities of high Volume, Velocity and Variety of data
- Not a SINGLE Open Source Project
• Ecosystem of Open Source Projects
• Work together to provide set of services
• It uses MapReduce
- Transforms standard commodity hardware into a service:
• Stores Big Data reliably (PB of data)
• Enables distributed computations
• Enourmous processing power and able to solve problems involving massive amounts
of data and computation.
-Large Hadoop cluster can have:
• 25 PB of data
• 4500 machines
-Story behind the name
• Doug Cutting, Chief Architect of Cloudera and one of the creators of Hadoop. It is
named after the stuffed yellow elepehant his 2 year old called “Hadoop”.

Key attributes of Hadoop
-Redundant and Reliable
• No risk of data loss
- Powerful
• All machines available
- Batch Processing
• Some pieces in real-time
• Submit job get results when done
- Distributed applications
• Write and test on one machine
• Scale to the whole cluster
- Runs on commodity hardware
• No need for expensive servers

,Hadoop architecture
-MapReduce
• Processing part of Hadoop
• Managing the processing tasks
• Submit the tasks to MapReduce
-HDFS
• Hadoop Distributed File System
• Stores the data
• Files and Directories
• Scales to many PB
- TaskTracker
• The MapReduce server
• Launching MapReduce tasks on the machine
- DataNode
• The HDFS Server
• Stores blocks of data
• Keep tracks of data
• Provides high bandwidt
How does it work?
-Structure
• Multiple machines with Hadoop create a cluster
• Replicate Hadoop installation in multiple machines
• Scale according to needs in a linear way
• Add nodes based on specific needs
• Storage
• Processing
• Bandwidth
-Task coordinator
• Hadoop needs a task coordinator
• JobTracker tracks running jobs
• Divides jobs into tasks
• Assigns tasks to each TaskTracker
• TaskTracker reports to JobTracker
• Running
• Completed
• JobTracker is responsible for TaskTracker status
• Noticing whether is online or not
• Assigns its tasks to another node
-Data coordinator
• Hadoop needs a data coordinator
• NameNode keeps information of data (al)location
• Talks directly to DataNodes for read and write
• For write permission and network architecture

, • Data never flow through the NameNode
• Only information ABOUT the data
• NameNode is responsible for DataNodes status
• Noticing whether is online or not
• Replicates the data on another node
-Automatic Failover
• When there is software or hardware failure
• Nodes on the cluster reassign the work of the failed node
• NameNode is responsible for DataNode status
• JobTracker is responsible for TaskTracker status
Characteristics
• Reliable and Robust
• Data replication on multiple DataNodes
• Tasks that fail are reassigned / redone
• Scalable
• Same code runs on 1 or 1000 machines
• Scales in a linear way
• Simple APIs available
• For Data
• For apps
• Powerful
• Process in parallel PB of data

In short:
• Hadoop is a layer between software and hardware that enables building computing clusters
on commodity hardware, based on an architecture that provides redundancy. The architecture
includes a NameNode, which is a data coordinator and “talks” to the DataNode of each
machine, which is the HDFS server. Also, it includes a JobTracker that “talks” to the
TaskTracker of each machine, which is the MapReduce server.

€4,49

Krijg toegang tot het volledige document:

100% tevredenheidsgarantie

Direct beschikbaar na je betaling

Lees online óf als PDF

Geen vaste maandelijkse kosten

Maak kennis met de verkoper

bascrypto

4,2

(49)

Maak kennis met de verkoper

bascrypto Tilburg University

Bekijk profiel

Volgen

Verkocht

323

Lid sinds

4 jaar

Aantal volgers

208

Documenten

Laatst verkocht

3 dagen geleden

4,2

49 beoordelingen

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Kwaliteit die je kunt vertrouwen: geschreven door studenten die slaagden en beoordeeld door anderen die dit document gebruikten.

Niet tevreden? Kies een ander document

Geen zorgen! Je kunt voor hetzelfde geld direct een ander document kiezen dat beter past bij wat je zoekt.

Betaal zoals je wilt, start meteen met leren

Geen abonnement, geen verplichtingen. Betaal zoals je gewend bent via iDeal of creditcard en download je PDF-document meteen.

“Gekocht, gedownload en geslaagd. Zo makkelijk kan het dus zijn.”

Alisha Student

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Je krijgt een PDF, die direct beschikbaar is na je aankoop. Het gekochte document is altijd, overal en oneindig toegankelijk via je profiel.

Tevredenheidsgarantie: hoe werkt dat?

Onze tevredenheidsgarantie zorgt ervoor dat je altijd een studiedocument vindt dat goed bij je past. Je vult een formulier in en onze klantenservice regelt de rest.

Van wie koop ik deze samenvatting?

Stuvia is een marktplaats, je koop dit document dus niet van ons, maar van verkoper bascrypto. Stuvia faciliteert de betaling aan de verkoper.

Zit ik meteen vast aan een abonnement?

Nee, je koopt alleen deze samenvatting voor €4,49. Je zit daarna nergens aan vast.

Is Stuvia te vertrouwen?

4,6 sterren op Google & Trustpilot (+1000 reviews) Afgelopen 30 dagen zijn er 50201 samenvattingen verkocht Opgericht in 2010, al 15 jaar dé plek om samenvattingen te kopen

Summary Digitization & Big Data Analytics

Geschreven voor

Documentinformatie

Onderwerpen

Voorbeeld van de inhoud

Meer vakken binnen Tilburg University (UVT) > Bedrijfseconomie

Maak kennis met de verkoper

Recent door jou bekeken

Waarom studenten kiezen voor Stuvia

Gemaakt door medestudenten, geverifieerd door reviews

Niet tevreden? Kies een ander document

Betaal zoals je wilt, start meteen met leren

Veelgestelde vragen

Wat krijg ik als ik dit document koop?

Tevredenheidsgarantie: hoe werkt dat?

Van wie koop ik deze samenvatting?

Zit ik meteen vast aan een abonnement?

Is Stuvia te vertrouwen?