Written by students who passed Immediately available after payment Read online or as PDF Wrong document? Swap it for free 4.6 TrustPilot
logo-home
Exam (elaborations)

Parallel Computer Organization & Design — Solutions Manual (All 8 Chapters, PDF)

Rating
-
Sold
1
Pages
114
Grade
A+
Uploaded on
19-10-2025
Written in
2025/2026

Complete Solutions Manual for Parallel Computer Organization and Design by Dubois, Annavaram, and Stenström. Step-by-step answers for all 8 chapters covering Amdahl & Gustafson laws, multicore/CMP and NUMA design, memory hierarchy and bandwidth/latency analysis, cache performance and prefetching, cache-coherence protocols (snooping & directory; MESI/MOESI), memory-consistency models (SC, TSO, Release), synchronization (locks, barriers, TM), interconnection networks (mesh/torus/fat-tree), routing & flow control (wormhole, VCs), GPU/SIMD and vector parallelism, scalability/roofline modeling, and programming models (OpenMP/MPI). Ideal for homework verification, exam prep, and self-study. parallel computer solutions, Dubois solutions manual, cache coherence MESI, directory coherence, memory consistency SC TSO, NUMA CMP design, Amdahl law problems, Gustafson law examples, interconnection networks mesh torus, fat-tree topology, wormhole routing, virtual channels, synchronization barriers locks, transactional memory, cache performance analysis, prefetching techniques, roofline model, OpenMP MPI exercises, GPU SIMD vectorization, false sharing, scalability analysis, multicore architecture answers

Show more Read less
Institution
Solution Manual
Course
Solution Manual

Content preview

ALL 8 CHAPTERS COVERED

,PARALLEL COMPUTER ORGANIZATION AND
DESIGN




Michel Dubois, Murali Annavaram and Per
Stenström




Exercise Solution Manual

June 2012




1

Copyright  2012 Michel Dubois, Murali Annavaram and Per Stenström

, CHAPTER 1
Problem 1.1

a. Using Amdahl’s speedup,
1 1
-  -----------------------------
------------------------------ -
F fp F ls
1 – F fp + ------- 1 – F ls + -------
10 2
Therefore,
F fp 5
-------  ---
F ls 9
b.Can you still find out which improvement is better based on these numbers?

Yes. The reason is Ffp/Fls is equal to the ratio of the execution time of floating point instructions
(ExTimefp) and the execution time of loads and stores (ExTimels). We can get the ratio of these exe-
cution time with given information. If the ratio is larger than 5/9, which is the value obtained in part
a, then we can say the floating point upgrade is better than the loads/stores upgrade.
ExTime fp = IC fp  CPI fp  T c

ExTime ls = IC ls  CPI ls  T c

ExTime total = IC total  CPI total  T c
Therefore,
F fp IC fp  CPI fp  T c ExTime fp
------- = ------------------------------------------
- = -----------------------
-
F ls IC ls  CPI ls  T c ExTime ls
Can you still estimate the maximum speedup of each upgrade using Amdahl’s law?

No, because the total execution time or average CPI of all instructions is not given and we cannot
get the fraction of execution time spent in floating point and loads/stores instructions respectively.

c. Floating-point improvement:
1.5 = 1 / (1-Ffp +Ffp/10)
Therefore, Ffp = 0.3707

Loads and Stores improvement:
1.5 = 1 / (1-Fls + Fls/2)
Therefore, Fls = 0.67

d. After upgrading to the floating point units, the execution time is

ExTimefp =  0.7 + -------  ExTime base = 0.73  ExTimebase
0.3
 10 
and the new fraction of time spent in loads and stores after the floating point upgrade is
0.20/0.73 = 0.274(27.4%).




2

Copyright  2012 Michel Dubois, Murali Annavaram and Per Stenström

, Therefore, the maximum speedup of the cache upgrade after the floating point unit upgrade is
1
Speedup = ------------------------------------------ = 1.1587
0.274
1 – 0.274 + -------------
2

Problem 1.2

We solve this problem assuming that N and the number of available processors P in the
machine goes from 1 to 1024.
T1 NT c PR
a. Speedup = ----- = -------------------------- = -------------
Tp N R+P
---- T c + NT b
P


b. For a given R, the speedup increases monotonically as P increases. The maximum speedup is
thus achieved when P is 1024 and the maximum speedup is
1024R
Speedup max = ---------------------- :
1024 + R

R
c. P  ------------
R–1

d. The execution time stays constant at NxTc for all P’s >1 and given N. As P increases from 1 to
1024, the workload size (N1) increases so that:
N1
NT c = ------ T c + N 1 T b
P
and:
NPR
N 1 = -------------
R+P
As P grows N1 tends to an asymptote equal to NxR.

e. Reconsidering a-c above in the context of growing workload size.
In this part, for given N, the workload (N1) grows according to d) above, so that the execution time
remains constant. In this context T1 is equal to N1Tc and TP remains fixed at NTc

N1 Tc PR
Speedup = ------------ = -------------
NT c R+P

Surprisingly the speedup with a growing workload is the same as the speedup in part a with con-
stant size workload, and therefore the maximum speedup and minimum P are also the same as in
part a. The reason is that the serial part (the bus accesses) grows with P, contrary to Amdahl’s or
Gustafson’s laws where the serial part remains a constant.

f. Let O = To/Tb.
T1 NT c PR
Speedup = ----- = ------------------------------------------- = ------------------------------2-
Tp N OP
---- T c + PT o + NT b R + P + ----------
P N




3

Copyright  2012 Michel Dubois, Murali Annavaram and Per Stenström

Written for

Institution
Solution Manual
Course
Solution Manual

Document information

Uploaded on
October 19, 2025
Number of pages
114
Written in
2025/2026
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

$18.99
Get access to the full document:

Wrong document? Swap it for free Within 14 days of purchase and before downloading, you can choose a different document. You can simply spend the amount again.
Written by students who passed
Immediately available after payment
Read online or as PDF


Also available in package deal

Thumbnail
Package deal
(Solutions Manual & Presentation Slides) Parallel Computer Organization & Design — All 8 Chapters (PDF)
-
2 2025
$ 37.98 More info

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
TestBanksStuvia Chamberlain College Of Nursng
View profile
Follow You need to be logged in order to follow users or courses
Sold
2959
Member since
2 year
Number of followers
1203
Documents
1974
Last sold
19 hours ago
TESTBANKS & SOLUTION MANUALS

if in any need of a Test bank and Solution Manual, fell free to Message me or Email donc8246@ gmail . All the best in your Studies

3.9

331 reviews

5
184
4
46
3
35
2
20
1
46

Trending documents

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions