100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Exam (elaborations)

EECS 485 Final Exam with complete solution

Rating
-
Sold
-
Pages
32
Grade
A
Uploaded on
12-08-2023
Written in
2023/2024

EECS 485 Final Exam with complete solution Map Reduce System provides - 1 Automatic parallelization & distribution Map Reduce System provides - 2 Fault-tolerance Map Reduce System provides - 3 Status and monitoring tools Map Reduce System provides - 4 Clean abstraction for programmers Map Reduce is Popular because - 1 Easy to write distributed programs Map Reduce is Popular because - 2 Built-in reliability on large clusters Map Reduce is Popular because - 3 Byte streams, not relations Map Reduce is Popular because - 4 Schema-later, or schema-never Map Reduce is Popular because - 5 Your choice of programming languages Map Reduce is Popular because - 6 Hadoop relatively easy to administer Mappers can work... Mappers can work independently Reducers can work... Reducers can work independently Grouper does not have to... Count Map Reduce Optimization 1 the mappers don't need write the words down, just "1" for each word of a certain length Map Reduce Optimization 2 mappers do part of the reducer work: number of 1-letter words, 2-letter words, etc Master work Dividing the work Grouper work Grouping the values by keys Work of the mappers and reducers differs with problem MR Computation Input key/value pairs Output different key/value pairs Map Function Takes an input pair and produces a set of intermediate key/value pairs MapReduce library

Show more Read less
Institution
Course

Content preview

EECS 485 Final Exam with complete solution
Map Reduce System provides - 1
Automatic parallelization & distribution
Map Reduce System provides - 2
Fault-tolerance
Map Reduce System provides - 3
Status and monitoring tools
Map Reduce System provides - 4
Clean abstraction for programmers
Map Reduce is Popular because - 1
Easy to write distributed programs
Map Reduce is Popular because - 2
Built-in reliability on large clusters
Map Reduce is Popular because - 3
Byte streams, not relations
Map Reduce is Popular because - 4
Schema-later, or schema-never
Map Reduce is Popular because - 5
Your choice of programming languages
Map Reduce is Popular because - 6
Hadoop relatively easy to administer
Mappers can work...
Mappers can work independently
Reducers can work...
Reducers can work independently
Grouper does not have to...
Count
Map Reduce Optimization 1
the mappers don't need write the words down, just "1" for each word of a certain length
Map Reduce Optimization 2
mappers do part of the reducer work: number of 1-letter words, 2-letter words, etc
Master work
Dividing the work
Grouper work
Grouping the values by keys
Work of the mappers and reducers
differs with problem
MR Computation
Input key/value pairs

Output different key/value pairs
Map Function
Takes an input pair and produces a set of intermediate key/value pairs
MapReduce library

,Groups together all intermediate values associated with the same intermediate key and
passes them to the Reduce function
Reduce function
Accepts an intermediate key and a list of values for that key.

Merges together these values to form a possibly smaller set of values.
Map and reduce have related types
map (k1, v1) → list(k2, v2)

reduce (k2, list(v2)) → list(v2)
MR output is smaller
computing summary statistics
MR output is larger
computing some kind of data structure for downstream use
Output value produced per reduce invocation
Zero or one
Execution Pipeline
Stage 1, partition input data and run map() on many machines

Then group intermediate data by intermediate key

Stage 2, partition intermediate data by key and run reduce() on many machines

Output is whatever reduce() emits
Shuffle/Sort
Default: hash(key)%R

- Break input into M chunks

- Process each chunk w/ map process

- Group-by map output keys

- Place key-groups into R chunks

- Process each chunk w/ reduce process

- reduce fn's outputs go to disk
What can be a MapReduce program?
- URL counting in logs

- Inverted index construction for search engines, Sorting

- Massive image conversion, others
How do we know if a machine goes down?
Heartbeat messages tell master which machines are online

,Map Worker Dies
- Just restart that task on a different box

- You lose the map() work, but no big deal
Reduce Worker Dies
- Restart the reducer, using output from source mappers
Relational database management system (RDBMS) machine dies
Query is restarted
What about slow, not dead, machines?
- Speculative execution for stragglers

- Kill the 2nd-place finisher
What about data placement?
Use GFS across cluster disks; start tasks where the target data already lies
Isn't the intermediate data size large?
- Use a local reducer called a Combiner at each map

- Compress data between map and reduce
Mapper and Reducer are...
Stateless - no side effects
Scalability and fault-tolerance achieved by optimizing the execution engine once
Use it many times by writing different map and reduce functions for different
applications
Map and reduce functions inspired by functions of the same name in...
Lisp programming language
Functional programming
Computation as the evaluation of mathematical functions
MR is easy to...
Parallelize
Simple Distributed System Failures
Power loss

Process crash
Complex Distributed System Failures
- Slow or unresponsive application

- Network partition
Nightmare Distributed System Failures
Bugs, hackers
Fail-Stop, Synchronous environment
- Component halts immediately

- Other components can tell that it has failed

- Example: process crash
Fail-Stop, Asynchronous environment

, - Components may fail (stop working) and recover

- Components may take arbitrarily long to respond

- Others may not be able to tell when a component fails

- Examples: network partition, hung process
Byzantine
- Components may fail in arbitrary ways

- Examples: hacker compromises a component
Primary-Backup fault tolerance
- Both primary and backup should commit modifications before acknowledging

- Client will never lose data for operations it observes to complete

- Client is uncertain about operations in progress. Must recheck with new primary to see
if it completed.
Asynchronous environment 1
Network can delay messages arbitrarily
Asynchronous environment 2
Network can partition
Asynchronous environment 3
Servers can crash and take a long time to recover
Asynchronous environment 4
Servers can hang and stop responding
Asynchronous environment 5
Servers can run very slow and appear to be crashed
In async environment...
Hard to differentiate slow and failed components
Async means...
primary-backup not great in practice
Paxos
Goal: nodes all agree on one result

Tolerates fail-stop failures in synchronous environ.

Requires 2f+1 nodes to tolerate f faults (Primary-backup only needed f+1)

Example: 5 Paxos nodes can tolerate 2 such faults
Basic (synod) protocol Phase 1
- Leaders propose ballot #.

- Others (acceptors) adopt ballot if highest # so far.

- Leader waits for f+1 adoptions

Written for

Course

Document information

Uploaded on
August 12, 2023
Number of pages
32
Written in
2023/2024
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
magdamwikash23 Western Governers University
Follow You need to be logged in order to follow users or courses
Sold
112
Member since
3 year
Number of followers
94
Documents
5328
Last sold
1 month ago
Magda

NURSING STUDY GUIDES/EXAMS AND NOTES ALL VERIFIED BY EXPERTS All my uploaded documents, exams and essays are verified by relevant experts.I can assure an A or at least 90% if you use any of my documents.

3.9

14 reviews

5
7
4
2
3
2
2
2
1
1

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions