,PARALLEL COMPUTER ORGANIZATION AND
DESIGN
Michel Dubois, Murali Annavaram and Per
Stenström
Exercise Solution Manual
June 2012
1
Copyright 2012 Michel Dubois, Murali Annavaram and Per Stenström
, CHAPTER 1
Problem 1.1
a. Using Amdahl’s speedup,
1 1
- -----------------------------
------------------------------ -
F fp F ls
1 – F fp + ------- 1 – F ls + -------
10 2
Therefore,
F fp 5
------- ---
F ls 9
b.Can you still find out which improvement is better based on these numbers?
Yes. The reason is Ffp/Fls is equal to the ratio of the execution time of floating point instructions
(ExTimefp) and the execution time of loads and stores (ExTimels). We can get the ratio of these exe-
cution time with given information. If the ratio is larger than 5/9, which is the value obtained in part
a, then we can say the floating point upgrade is better than the loads/stores upgrade.
ExTime fp = IC fp CPI fp T c
ExTime ls = IC ls CPI ls T c
ExTime total = IC total CPI total T c
Therefore,
F fp IC fp CPI fp T c ExTime fp
------- = ------------------------------------------
- = -----------------------
-
F ls IC ls CPI ls T c ExTime ls
Can you still estimate the maximum speedup of each upgrade using Amdahl’s law?
No, because the total execution time or average CPI of all instructions is not given and we cannot
get the fraction of execution time spent in floating point and loads/stores instructions respectively.
c. Floating-point improvement:
1.5 = 1 / (1-Ffp +Ffp/10)
Therefore, Ffp = 0.3707
Loads and Stores improvement:
1.5 = 1 / (1-Fls + Fls/2)
Therefore, Fls = 0.67
d. After upgrading to the floating point units, the execution time is
ExTimefp = 0.7 + ------- ExTime base = 0.73 ExTimebase
0.3
10
and the new fraction of time spent in loads and stores after the floating point upgrade is
0.20/0.73 = 0.274(27.4%).
2
Copyright 2012 Michel Dubois, Murali Annavaram and Per Stenström
, Therefore, the maximum speedup of the cache upgrade after the floating point unit upgrade is
1
Speedup = ------------------------------------------ = 1.1587
0.274
1 – 0.274 + -------------
2
Problem 1.2
We solve this problem assuming that N and the number of available processors P in the
machine goes from 1 to 1024.
T1 NT c PR
a. Speedup = ----- = -------------------------- = -------------
Tp N R+P
---- T c + NT b
P
b. For a given R, the speedup increases monotonically as P increases. The maximum speedup is
thus achieved when P is 1024 and the maximum speedup is
1024R
Speedup max = ---------------------- :
1024 + R
R
c. P ------------
R–1
d. The execution time stays constant at NxTc for all P’s >1 and given N. As P increases from 1 to
1024, the workload size (N1) increases so that:
N1
NT c = ------ T c + N 1 T b
P
and:
NPR
N 1 = -------------
R+P
As P grows N1 tends to an asymptote equal to NxR.
e. Reconsidering a-c above in the context of growing workload size.
In this part, for given N, the workload (N1) grows according to d) above, so that the execution time
remains constant. In this context T1 is equal to N1Tc and TP remains fixed at NTc
N1 Tc PR
Speedup = ------------ = -------------
NT c R+P
Surprisingly the speedup with a growing workload is the same as the speedup in part a with con-
stant size workload, and therefore the maximum speedup and minimum P are also the same as in
part a. The reason is that the serial part (the bus accesses) grows with P, contrary to Amdahl’s or
Gustafson’s laws where the serial part remains a constant.
f. Let O = To/Tb.
T1 NT c PR
Speedup = ----- = ------------------------------------------- = ------------------------------2-
Tp N OP
---- T c + PT o + NT b R + P + ----------
P N
3
Copyright 2012 Michel Dubois, Murali Annavaram and Per Stenström