Hennessy
Solution Manual all Chapters
Chapter 1 Solutions S-3
1.1 Personal computer: Computer that emphasizes delivery of good performance
to a single user at low cost and usually executes third-party software.
Server: Computer used for large workloads and usually accessed via a network.
Embedded computer: Computer designed to run one application or one set
of related applications and integrated into a single system.
1.2
a. Performance via Pipelining
b. Dependability via Redundancy
c. Performance via Prediction
d. Make the Common Case Fast
e. Hierarchy of Memories
f. Performance via Parallelism
g. Use Abstraction to Simplify Design
1.3 The program is compiled into an assembly language program, which is itself
assembled into a machine language program.
1.4
a. 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160 bytes/
frame.
b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds
1.5
Max. Max. SP
Clock Integer DRAM Floating
Desktop Speed IPC/ Bandwidth Point
Processor Year Tech (GHz) core Cores (GB/s) (Gflop/s) MiB
Westmere 2010 32 3.33 4 2 17.1 107 4
i7-620
Ivy Bridge 2013 22 3.90 6 4 25.6 250 8
i7-3770K
Broadwell 2015 14 4.20 8 4 34.1 269 8
i7-6700K
Kaby Lake 2017 14 4.50 8 4 38.4 288 8
i7-7700K
Coffee Lake 2019 14 4.90 8 8 42.7 627 12
i7-9700K
Imp./year 20% 4% 7% 15% 10% 19% 12%
Doubles every 4 years 18 years 10 years 5 years 7 years 4 years 6 years
solution 1123.indd 3 22-09-2021 20:52:22
, S-4 Chapter 1 Solutions
1.6
a. performance of P1 (instructions/sec) = 3 × 109/1.5 = 2 × 109
performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109
performance of P3 (instructions/sec) = 4 × 109/2.2 = 1.8 × 109
b. cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
c. No. instructions(P1) = 30 × 109/1.5 = 20 × 109
No. instructions(P2) = 25 × 109/1 = 25 × 109
No. instructions(P3) = 40 × 109/2.2 = 18.18 × 109
CPInew = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6
f = No. instr. × CPI/time, then
f(P1) = 20 × 109 × 1.8/7 = 5.14 GHz
f(P2) = 25 × 109 × 1.2/7 = 4.28 GHz
f(P3) = 18.18 × 109 × 2.6/7 = 6.75 GHz
1.7
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 ×
105 instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109)
= 10.4 × 10−4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109)
= 6.66 × 10−4 s
CPI(P1) = 10.4 × 10−4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10−4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1+ 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2+ 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105
1.8
a. CPI = Texec × f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
solution 1123.indd 4 22-09-2021 20:52:22
, Chapter 1 Solutions S-5
b. fB/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. TA/Tnew = 1.67
TB/Tnew = 2.27
1.9
1.9.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.9.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.9.3 (Snew + Dnew)/(Sold + Dold) = 0.90
Dnew = C × Vnew 2 × F
Sold = Vold × I
Snew = Vnew × I
Therefore:
Vnew = [Dnew/(C × F)]1/2
Dnew = 0.90 × (Sold + Dold) − Snew
Snew = Vnew × (Sold/Vold)
Pentium 4:
Snew = Vnew × (10/1.25) = Vnew × 8
Dnew = 0.90 × 100 − Vnew × 8 = 90 − Vnew × 8
Vnew = [(90 − Vnew × 8)/(3.2E8 × 3.6E9)]1/2
Vnew = 0.85 V
Core i5:
Snew = Vnew × (30/0.9) = Vnew × 33.3
Dnew = 0.90 × 70 − Vnew × 33.3 = 63 − Vnew × 33.3
Vnew = [(63 − Vnew × 33.3)/(2.9E8 × 3.4E9)]1/2
Vnew = 0.64 V
solution 1123.indd 5 22-09-2021 20:52:22
, S-6 Chapter 1 Solutions
1.10
1.10.1
p # arith inst. # L/S inst. # branch inst. cycles ex. time speedup
1 2.56E9 1.28E9 2.56E8 1.92E10 9.60 1.00
2 1.83E9 9.14E8 2.56E8 1.41E10 7.04 1.36
4 9.14E8 4.57E8 2.56E8 7.68E9 3.84 2.50
8 4.57E8 2.29E8 2.56E8 4.48E9 2.24 4.29
1.10.2
p ex. time
1 41.0
2 29.3
4 14.6
8 7.33
1.10.3 3
1.11
1.11.1 die area15cm = wafer area/dies per wafer = π × 7. = 2.10 cm2
yield15cm = 1/(1 + (0.020 × 2.10/2))2 = 0.9593
die area20cm = wafer area/dies per wafer = π × 102/100 = 3.14 cm2
yield20cm = 1/(1 + (0.031 × 3.14/2))2 = 0.9093
1.11.2 cost/die15cm = 12/(84 × 0.9593) = 0.1489
cost/die20cm = 15/(100 × 0.9093) = 0.1650
1.11.3 die area15cm = wafer area/dies per wafer = π × 7.52/(84 × 1.1) = 1.91 cm2
yield15cm = 1/(1 + (0.020 × 1.15 × 1.91/2))2 = 0.9575
die area20cm = wafer area/dies per wafer = π × 102/(100 × 1.1) = 2.86 cm2
yield20cm = 1/(1 + (0.03 × 1.15 × 2.86/2))2 = 0.9082
1.11.4 defects per area0.92 = (1–y .5)/(y .5 × die_area/2) = (1 − 0.92.5)/
(0.92.5 × 2/2) = 0.043 defects/cm2
defects per area0.95 = (1–y .5)/(y .5 × die_area/2) = (1 − 0.95.5)/
(0.95.5 × 2/2) = 0.026 defects/cm2
1.12
1.12.1 CPI = clock rate × CPU time/instr. count
clock rate = 1/cycle time = 3 GHz
CPI(bzip2) = 3 × 109 × 750/(2389 × 109)= 0.94
solution 1123.indd 6 22-09-2021 20:52:22