Exam (elaborations)

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual)

Rating

Sold

Pages

182

Grade

A+

Uploaded on

15-11-2021

Written in

2021/2022

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual) includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party soft ware. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download soft ware (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse-scale computer: Th ousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system. 1 .2 a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design 1 .3 Th e program is compiled into an assembly language program, which is then assembled into a machine language program. 1 .4 a. 1280 × 1024 pixels = 1,310,720 pixels = > 1,310,720 × 3 = 3,932,160 bytes/ frame. b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds 1 .5 a. performance of P1 (instructions/sec) = 3 × 10 9 /1.5 = 2 × 10 9 performance of P2 (instructions/sec) = 2.5 × 10 9 /1.0 = 2.5 × 10 9 performance of P3 (instructions/sec) = 4 × 10 9 /2.2 = 1.8 × 10 9 S-4 Chapter 1 Solutions b. cycles(P1) = 10 × 3 × 10 9 = 30 × 10 9 s cycles(P2) = 10 × 2.5 × 10 9 = 25 × 10 9 s cycles(P3) = 10 × 4 × 10 9 = 40 × 10 9 s c. No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 10 9 No. instructions(P2) = 25 × 10 9 /1 = 25 × 10 9 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 10 9 CPI new = CPI old × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6 f = No. instr. × CPI/time, then f (P1) = 20 × 10 9 × 1.8/7 = 5.14 GHz f (P2) = 25 × 10 9 × 1.2/7 = 4.28 GHz f (P1) = 18.18 × 10 9 × 2.6/7 = 6.75 GHz 1 .6 a. Class A: 10 5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 10 5 instr. Time = No. instr. × CPI/clock rate Total time P1 = (10 5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3)/(2.5 × 10 9 ) = 10.4 × 10 −4 s Total time P2 = (10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2)/(3 × 10 9 ) = 6.66 × 10 −4 s CPI(P1) = 10.4 × 10 −4 × 2.5 × 10 9 /10 6 = 2.6 CPI(P2) = 6.66 × 10 −4 × 3 × 10 9 /10 6 = 2.0 b. clock cycles(P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3 = 26 × 10 5 clock cycles(P2) = 10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2 = 20 × 10 5 1 .7 a. CPI = T exec × f/No. instr. Compiler A CPI = 1.1 Compiler B CPI = 1.25 b. f B /f A = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37 c. T A /T new = 1.67 T B /T new = 2.27 Chapter 1 Solutions S-5 1 .8 1.8.1 C = 2 × DP/(V 2 × F) Pentium 4: C = 3.2E–8F Core i5 Ivy Bridge: C = 2.9E–8F 1.8.2 Pentium 4: 10/100 = 10% Core i5 Ivy Bridge: 30/70 = 42.9% 1.8.3 (S new + D new )/(S old + D old ) = 0.90 D new = C × V new 2 × F S old = V old × I S new = V new × I Therefore: V new = [D new /(C × F)] 1/2 D new = 0.90 × (S old + D old ) − S new S new = V new × (S old /V old ) Pentium 4: S new = V new × (10/1.25) = V new × 8 D new = 0.90 × 100 − V new × 8 = 90 − V new × 8 V new = [(90 − V new × 8)/(3.2E8 × 3.6E9)] 1/2 V new = 0.85 V Core i5: S new = V new × (30/0.9) = V new × 33.3 D new = 0.90 × 70 − V new × 33.3 = 63 − V new × 33.3 V new = [(63 − V new × 33.3)/(2.9E8 × 3.4E9)] 1/2 V new = 0.64 V 1 .9 1.9.1 S-6 Chapter 1 Solutions 1.9.2 1.9.3 3 1.10 1.10.1 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /84 = 2.10 cm 2 yield 15cm = 1/(1 + (0.020 × 2.10/2)) 2 = 0.9593 die area 20cm = wafer area/dies per wafer = π × 10 2 /100 = 3.14 cm 2 yield 20cm = 1/(1 + (0.031 × 3.14/2)) 2 = 0.9093 1.10.2 cost/die 15cm = 12/(84 × 0.9593) = 0.1489 cost/die 20cm = 15/(100 × 0.9093) = 0.1650 1.10.3 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /(84 × 1.1) = 1.91 cm 2 yield 15cm = 1/(1 + (0.020 × 1.15 × 1.91/2)) 2 = 0.9575 die area 20cm = wafer area/dies per wafer = π × 10 2 /(100 × 1.1) = 2.86 cm 2 yield 20cm = 1/(1 + (0.03 × 1.15 × 2.86/2)) 2 = 0.9082 1.10.4 defects per area 0.92 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.92 .5 )/ (0.92 .5 × 2/2) = 0.043 defects/cm 2 defects per area 0.95 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.95 .5 )/ (0.95 .5 × 2/2) = 0.026 defects/cm 2 1 .11 1 .11.1 CPI = clock rate × C PU time/instr. count clock rate = 1/cycle time = 3 GHz CPI(bzip2) = 3 × 10 9 × 750/(2389 × 10 9 ) = 0.94 1 .11.2 SPEC ratio = ref. time/execution time SPEC ratio(bzip2) = 9 650/750 = 12.86 1 .11.3 CPU time = N o. instr. × C PI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the number of instructions, that is 10%. Chapter 1 Solutions S-7 1 .11.4 CPU time(before) = No. instr. × CPI/clock rate CPU time(aft er) = 1.1 × No. instr. × 1.05 × CPI/clock rate CPU time(aft er)/CPU time(before) = 1.1 × 1.05 = 1.155. Th us, CPU time is increased by 15.5%. 1 .11.5 SPECratio = reference time/CPU time SPECratio(aft er)/SPECratio(before) = CPU time(before)/CPU time(aft er) = 1/1.1555 = 0.86. Th e SPECratio is decreased by 14%. 1 .11.6 CPI = ( CPU time × c lock rate)/No. instr. CPI = 700 × 4 × 10 9 /(0.85 × 2389 × 10 9 ) = 1.37 1 .11.7 Clock rate ratio = 4 GHz/3 GHz = 1 .33 CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1.45 Th ey are diff erent because, although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage. 1 .11.8 700/750 = 0.933. CPU time reduction: 6.7% 1 .11.9 No. instr. = C PU time × clock rate/CPI No. instr. = 960 × 0.9 × 4 × 10 9 /1.61 = 2146 × 10 9 1 .11.10 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × CPI/0.9 × CPU time = 1/0.9 clock rate old = 3.33 GHz 1 .11.11 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × 0.85 × CPI/0.80 CPU time = 0.85/0.80, clock rate old = 3.18 GHz 1 .12 1.12.1 T(P1) = 5 × 10 9 × 0.9/(4 × 10 9 ) = 1.125 s T(P2) = 10 9 × 0.75/(3 × 10 9 ) = 0.25 s clock rate(P1) > clock rate(P2), performance(P1) < performance(P2) 1.12.2 T(P1) = No. instr. × CPI/clock rate T(P1) = 2.25 3 1021 s T(P2) 5 N × 0.75/(3 × 10 9 ), then N = 9 × 10 8 1.12.3 MIPS = Clock rate × 10 −6 /CPI MIPS(P1) = 4 × 10 9 × 10 −6 /0.9 = 4.44 × 10 3 S-8 Chapter 1 Solutions MIPS(P2) = 3 × 10 9 × 10 −6 /0.75 = 4.0 × 10 3 MIPS(P1) > MIPS(P2), performance(P1) < performance(P2) (from 11a) 1.12.4 MFLOPS = No. FP operations × 10 −6 /T MFLOPS(P1) = .4 × 5E9 × 1E-6/1.125 = 1.78E3 MFLOPS(P2) = .4 × 1E9 × 1E-6/.25 = 1.60E3 MFLOPS(P1) > MFLOPS(P2), performance(P1) < performance(P2) (from 11a) 1 .13 1.13.1 T fp = 70 × 0.8 = 56 s. T new = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% 1.13.2 T new = 250 × 0.8 = 200 s, T fp + T l/s + T branch = 165 s, T int = 35 s. Reduction time INT: 58.8% 1.13.3 T new = 250 × 0.8 = 200 s, T fp + T int + T l/s = 210 s. NO 1 .14 1.14.1 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 clock cycles = 512 × 10 6 ; T CPU = 0.256 s To have the number of clock cycles by improving the CPI of FP instructions: CPI improved fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved fp = (clock cycles/2 − (CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr.)) / No. FP instr. CPI improved fp = (256 − 4 62)/50 < 0 = = > not possible 1.14.2 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI improved l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved l/s = (clock cycles/2 − (CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI branch × No. branch instr.)) / No. L/S instr. CPI improved l/s = (256 − 198)/80 = 0.725 1.14.3 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. Chapter 1 Solutions S-9 T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 CPI int = 0.6 × 1 = 0.6; CPI fp = 0.6 × 1 = 0.6; CPI l/s = 0.7 × 4 = 2.8; CPI branch = 0.7 × 2 = 1.4 T CPU (before improv.) = 0.256 s; T CPU (aft er improv.) = 0.171 s 1 .15 Solutions 2 Chapter 2 Solutions S-3 2.1 a ddi x5, x7,-5 a dd x 5, x5, x6 [addi f,h,-5 (note, no subi) add f,f,g] 2 .2 f = g+h+i 2.3 s ub x 30, x28, x29 // compute i-j slli x30, x30, 3 // multiply by 8 to convert the word offset to a byte offset ld x30, 0(x3) // load A[i-j] sd x30, 64(x11) // store in B[8] 2.4 B [g]= A[f] + A[f+1] slli x30, x5, 3 // x30 = f*8 add x30, x10, x30 // x30 = &A[f] slli x31, x6, 3 // x31 = g*8 add x31, x11, x31 // x31 = &B[g] ld x5, 0(x30) // f = A[f] addi x12, x30, 8 // x12 = &A[f]+8 (i.e. &A[f+1]) ld x30, 0(x12) // x30 = A[f+1] add x30, x30, x5 // x30 = A[f+1] + A[f] sd x30, 0(x31) // B[g] = x30 (i.e. A[f+1] + A[f]) 2.5 2.6 8 2.7 slli x28, x28, 3 // x28 = i*8 ld x28, 0(x10) // x28 = A[i] slli x29, x29, 3 // x29 = j*8 ld x29, 0(x11) // x29 = B[j] add x29, x28, x29 // Compute x29 = A[i] + B[j] sd x29, 64(x11) // Store result in B[8] S-4 Chapter 2 Solutions 2.8 f = 2*(&A) addi x30, x10, 8 // x30 = &A[1] addi x31, x10, 0 // x31 = &A sd x31, 0(x30) // A[1] = &A ld x30, 0(x30) // x30 = A[1] = &A add x5, x30, x31 // f = &A + &A = 2*(&A) 2.9 addi x30,x10,8 addi x31,x10,0 sd x31,0(x30) ld x30,0(x30) add x5, x30, x31 I-type 0x13, 0x0, -- 0x13, 0x0, -- 0x23, 0x3, -- 0x3, 0x3, -- 0x33, 0x0, 0x0 R-type R-type S-type I-type 10 -- 30 30 5 31 0 8 0 0 -- -- -- -- 30 31 10 31 30 30 opcode, type funct3,7 rs1 rs2 rd imm 2.10 2.10.1 0x 2.10.2 overflow 2.10.3 0xB 2.10.4 no overflow 2.10.5 0xD 2.10.6 overflow 2.11 2.11.1 Th ere is an overfl ow if 128 + x6 > 2 63 − 1. In other words, if x6 > 2 63 − 129. Th ere is also an overfl ow if 128 + x6 < −2 63 . In other words, if x6 < −2 63 − 128 (which is impossible given the range of x6 ). 2.11.2 Th ere is an overfl ow if 128 – x6 > 2 63 − 1. In other words, if x6 < −2 63 + 129. Th ere is also an overfl ow if 128 – x6 < −2 63 . In other words, if x6 > 2 63 + 128 (which is impossible given the range of x6 ). 2.11.3 Th ere is an overfl ow if x6 − 128 > 2 63 − 1. In other words, if x6 < 2 63 + 127 (which is impossible given the range of x6 ). Th ere is also an overfl ow if x6 − 128 < −2 63 . In other words, if x6 < −2 63 + 128. 2 .12 R -type: add x1, x1, x1 Chapter 2 Solutions S-5 2.13 S -type: 0x25F3023 ( 0011) 2.14 R -type: sub x6, x7, x5 (0x: ) 2 .15 I -type: ld x3, 4(x27) (0x4DB183: ) 2.16 2.16.1 Th e opcode would expand from 7 bits to 9. Th e rs1 , rs2 , and rd fi elds would increase from 5 bits to 7 bits. 2.16.2 Th e opcode would expand from 7 bits to 12. Th e rs1 and rd fi elds would increase from 5 bits to 7 bits. Th is change does not aff ect the imm fi eld per se , but it might force the ISA designer to consider shortening the immediate fi eld to avoid an increase in overall instruction size. 2.16.3 * Increasing the size of each bit fi eld potentially makes each instruction longer, potentially increasing the code size overall. * However, increasing the number of registers could lead to less register spillage, which would reduce the total number of instructions, possibly reducing the code size overall. 2.17 2.17.1 0xababefef8 2.17.2 0x 2.17.3 0x545 2 .18 It can be done in eight RISC-V instructions: addi x7, x0, 0x3f // Create bit mask for bits 16 to 11 slli x7, x7, 11 // Shift the masked bits and x28, x5, x7 // Apply the mask to x5 slli x7, x6, 15 // Shift the mask to cover bits 31 to 26 xori x7, x7, -1 // This is a NOT operation and x6, x6, x7 // “Zero out” positions 31 to 26 of x6 slli x28, x28, 15 // Move selection from x5 into positions 31 to 26 or x6, x6, x28 // Load bits 31 to 26 from x28 2 .19 x ori x5, x6, -1 S-6 Chapter 2 Solutions 2.20 l d x 6, 0(x17) slli x6, x6, 4 2 .21 x 6 = 2 2.22 2.22.1 [0x1ff00000, 0x200FFFFE] 2.22.2 [0x1FFFF000, 0x20000ffe] 2.23 2.23.1 Th e UJ instruction format would be most appropriate because it would allow the maximum number of bits possible for the “ loop ” parameter, thereby maximizing the utility of the instruction. 2.23.2 It can be done in three instructions: loop: addi x29, x29, -1 // Subtract 1 from x29 bgt x29, x0, loop // Continue if x29 not negative addi x29, x29, 1 // Add back 1 that shouldn’t have been subtracted. 2.24 2.24.1 Th e fi nal value of xs is 20 . 2.24.2 acc = 0; i = 10; w hile (i ! = 0) { a cc += 2; i --; } 2 .24.3 4 *N + 1 instructions. 2.24.4 (Note: change condition ! = to > = in the while loop) a cc = 0; i = 10; w hile (i >= 0) { a cc += 2; i --; } Chapter 2 Solutions S-7 2 .25 Th e C code can be implemented in RISC-V assembly as follows. L OOPI: addi x7, x0, 0 // Init i = 0 bge x7, x5, ENDI // While i < a addi x30, x10, 0 // x30 = &D addi x29, x0, 0 // Init j = 0 LOOPJ: bge x29, x6, ENDJ // While j < b add x31, x7, x29 // x31 = i+j sd x31, 0(x30) // D[4*j] = x31 addi x30, x30, 32 // x30 = &D[4*(j+1)] addi x29, x29, 1 // j++ jal x0, LOOPJ ENDJ: addi x7, x7, 1 // i++; jal x0, LOOPI ENDI: 2 .26 Th e code requires 13 RISC-V instructions. When a = 10 and b = 1, this results in 123 instructions being executed. 2.27 / / This C code corresponds most directly to the given assembly. int i; for (i = 0; i < 100; i++) { result += *MemArray; MemArray++; } return result; // However, many people would write the code this way: int i; for (i = 0; i < 100; i++) { result += MemArray[i]; } return result; S-8 Chapter 2 Solutions 2.28 T he address of the last element of MemArray can be used to terminate the loop: add x29, x10, 800 // x29 = &MemArray[101] LOOP: ld x7, 0(x10) add x5, x5, x7 addi x10, x10, 8 blt x10, x29, LOOP // Loop until MemArray points to one-past the last element 2.29 // IMPORTANT! Stack pointer must reamin a multiple of 16!!!! fib: beq x10, x0, done // If n==0, return 0 addi x5, x0, 1 beq x10, x5, done // If n==1, return 1 addi x2, x2, -16 // Allocate 2 words of stack space sd x1, 0(x2) // Save the return address sd x10, 8(x2) // Save the current n addi x10, x10, -1 // x10 = n-1 jal x1, fib // fib(n-1) ld x5, 8(x2) // Load old n from the stack sd x10, 8(x2) // Push fib(n-1) onto the stack addi x10, x5, -2 // x10 = n-2 jal x1, fib // Call fib(n-2) ld x5, 8(x2) // x5 = fib(n-1) add x10, x10, x5 // x10 = fib(n-1)+fib(n-2) // Clean up: ld x1, 0(x2) // Load saved return address addi x2, x2, 16 // Pop two words from the stack done: jalr x0, x1 2 .30 [answers will vary] Chapter 2 Solutions S-9 2.31 / / IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11= c+d from the stack jal x1, g // Call x10 = g(g(a,b), c+d) ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jalr x0, x1 2 .32 We can use the tail-call optimization for the second call to g , saving one instruction: // IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11 = c+d from the stack ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jal x0, g // Call x10 = g(g(a,b), c+d) 2 .33 *We have no idea what the contents of x 10-x14 are, g can set them as it pleases. *We don’t know what the precise contents of x8 and sp are; but we do know that they are identical to the contents when f was called. *Similarly, we don’t know what the precise contents of x1 are; but, we do know that it is equal to the return address set by the “ jal x1, f ” instruction that invoked f . S-10 Chapter 2 Solutions 2.34 a_to_i: addi x28, x0, 10 # Just stores the constant 10 addi x29, x0, 0 # Stores the running total addi x5, x0, 1 # Tracks whether input is positive or negative # Test for initial ‘+’ or ‘-’ lbu x6, 0(x10) # Load the first character addi x7, x0, 45 # ASCII ‘-’ bne x6, x7, noneg addi x5, x0, -1 # Set that input was negative addi x10, x10, 1 # str++ jal x0, main_atoi_loop noneg: addi x7, x0, 43 # ASCII ‘+’ bne x6, x7, main_atoi_loop addi x10, x10, 1 # str++ main_atoi_loop: lbu x6, 0(x10) # Load the next digit beq x6, x0, done # Make sure next char is a digit, or fail addi x7, x0, 48 # ASCII ‘0’ sub x6, x6, x7 blt x6, x0, fail # *str < ‘0’ bge x6, x28, fail # *str >= ‘9’ # Next char is a digit, so accumulate it into x29 mul x29, x29, x28 # x29 *= 10 add x29, x29, x6 # x29 += *str - ‘0’ addi x10, x10, 1 # str++ jal x0, main_atoi_loop done: addi x10, x29, 0 # Use x29 as output value mul x10, x10, x5 # Multiply by sign jalr x0, x1 # Return result fail: addi x10, x0, -1 jalr x0, x1 2.35 2.35.1 0x11 2 .35.2 0 x88 Chapter 2 Solutions S-11 2 .36 l ui x 10, 0 x11223 addi x10, x10, 0x344 slli x10, x10, 32 lui x5, 0x55667 addi x5, x5, 0x788 add x10, x10, x5 2.37 setmax: try: lr.d x5, (x10) # Load-reserve *shvar bge x5, x11, release # Skip update if *shvar > x addi x5, x11, 0 release: sc.d x7, x5, (x10) bne x7, x0, try # If store-conditional failed, try again jalr x0, x1 2 .38 W hen two processors A and B begin executing this loop at the same time, at most one of them will execute the store-conditional instruction successfully, while the other will be forced to retry the loop. If processor A’s store-conditional successds initially, then B will re-enter the try block, and it will see the new value of shvar written by A when it fi nally succeeds. Th e hardware guarantees that both processors will eventually execute the code completely. 2.39 2.39.1 No. Th e resulting machine would be slower overall. Current CPU requires (num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 500*1 + 300*10 + 100*3 = 3800 cycles. Th e new CPU requires (.75*num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 375*1 + 300*10 + 100*3 = 3675 cycles. However, given that each of the new CPU’s cycles is 10% longer than the original CPU’s cycles, the new CPU’s 3675 cycles will take as long as 4042.5 cycles on the original CPU. 2.39.2 If we double the performance of arithmetic instructions by reducing their CPI to 0.5, then the the CPU will run the reference program in (500*.5) + (300*10) + 100*3 = 3550 cycles. Th is represents a speedup of 1.07. If we improve the performance of arithmetic instructions by a factor of 10 (reducing their CPI to 0.1), then the the CPU will run the reference program in (500*.1) + (300*10) + 100*3 = 3350 cycles. Th is represents a speedup of 1.13. S-12 Chapter 2 Solutions 2.40 2.40.1 Take the weighted average: 0.7*2 + 0.1*6 + 0.2*3 = 2.6 2.40.2 For a 25% improvement, we must reduce the CPU to 2.6*.75 = 1.95. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.95. Solving for x shows that the arithmetic instructions must have a CPI of at most 1.07. 2.40.3 For a 50% improvement, we must reduce the CPU to 2.6*.5 = 1.3. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.3. Solving for x shows that the arithmetic instructions must have a CPI of at most 0.14 2 .41 ldr x28, x5(x10), 3 // Load x28=A[f] addi x5, x5, 1 // f++ ldr x29, x5(x10), 3 // Load x29=A[f+1] add x29, x29, x28 // Add x29 = A[f] + A[f+1] sdr x12, x6(x11), 3 // Store B[g] = x29 2.42 l dr x 28, x28, (x10), 3 / / Load x28=A[i] ldr x29, x29, (x11), 3 // Load x29=B[j] add x29, x28, x29 sd x29, 64(x11) // Store B[8]=x29 (don’t need scaled store here) Solutions 3

Show more Read less

Institution

Course

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Connected book

David A. Patterson, John L. Hennessy Computer Organization and Design

Edition:2012
ISBN:9780123747501
Edition:Unknown

Written for

Institution: Harvard University
Course: TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy

All documents for this subject (1)

Document information

Uploaded on: November 15, 2021
Number of pages: 182
Written in: 2021/2022
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

computer organization and design
hardware software interface

Content preview

, Chapter 1 Solutions S-3

1.1 Personal computer (includes workstation and laptop): Personal computers
emphasize delivery of good performance to single users at low cost and usually
execute third-party software.
Personal mobile device (PMD, includes tablets): PMDs are battery operated
with wireless connectivity to the Internet and typically cost hundreds of
dollars, and, like PCs, users can download software (“apps”) to run on them.
Unlike PCs, they no longer have a keyboard and mouse, and are more likely
to rely on a touch-sensitive screen or even speech input.
Server: Computer used to run large problems and usually accessed via a
network.
Warehouse-scale computer: Thousands of processors forming a large cluster.
Supercomputer: Computer composed of hundreds to thousands of processors
and terabytes of memory.
Embedded computer: Computer designed to run one application or one set
of related applications and integrated into a single system.

1.2
a. Performance via Pipelining
b. Dependability via Redundancy
c. Performance via Prediction
d. Make the Common Case Fast
e. Hierarchy of Memories
f. Performance via Parallelism
g. Design for Moore’s Law
h. Use Abstraction to Simplify Design

1.3 The program is compiled into an assembly language program, which is then
assembled into a machine language program.

1.4
a. 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160 bytes/
frame.
b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds

1.5
a. performance of P1 (instructions/sec) = 3 × 109/1.5 = 2 × 109
performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109
performance of P3 (instructions/sec) = 4 × 109/2.2 = 1.8 × 109

,S-4 Chapter 1 Solutions

b. cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
c. No. instructions(P1) = 30 × 109/1.5 = 20 × 109
No. instructions(P2) = 25 × 109/1 = 25 × 109
No. instructions(P3) = 40 × 109/2.2 = 18.18 × 109
CPInew = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6
f = No. instr. × CPI/time, then
f(P1) = 20 × 109 × 1.8/7 = 5.14 GHz
f(P2) = 25 × 109 × 1.2/7 = 4.28 GHz
f(P1) = 18.18 × 109 × 2.6/7 = 6.75 GHz

1.6
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 × 105
instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109) =
10.4 × 10−4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109) =
6.66 × 10−4 s
CPI(P1) = 10.4 × 10−4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10−4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105

1.7
a. CPI = Texec × f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
b. fB/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. TA/Tnew = 1.67
TB/Tnew = 2.27

, Chapter 1 Solutions S-5

1.8
1.8.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.8.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.8.3 (Snew + Dnew)/(Sold + Dold) = 0.90
Dnew = C × Vnew 2 × F
Sold = Vold × I
Snew = Vnew × I
Therefore:
Vnew = [Dnew/(C × F)]1/2
Dnew = 0.90 × (Sold + Dold) − Snew
Snew = Vnew × (Sold/Vold)
Pentium 4:
Snew = Vnew × (10/1.25) = Vnew × 8
Dnew = 0.90 × 100 − Vnew × 8 = 90 − Vnew × 8
Vnew = [(90 − Vnew × 8)/(3.2E8 × 3.6E9)]1/2
Vnew = 0.85 V
Core i5:
Snew = Vnew × (30/0.9) = Vnew × 33.3
Dnew = 0.90 × 70 − Vnew × 33.3 = 63 − Vnew × 33.3
Vnew = [(63 − Vnew × 33.3)/(2.9E8 × 3.4E9)]1/2
Vnew = 0.64 V

1.9
1.9.1

$14.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

Expert001

4.2

(159)

Get to know the seller

Expert001 Chamberlain School Of Nursing

View profile

Sold

802

Member since

4 year

Number of followers

566

Documents

1190

Last sold

3 days ago

Expert001

High quality, well written Test Banks, Guides, Solution Manuals and Exams to enhance your learning potential and take your grades to new heights. Kindly leave a review and suggestions. We do take pride in our high-quality services and we are always ready to support all clients.

4.2

159 reviews

104

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller Expert001. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $14.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 52514 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 16 years now

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual)

Connected book

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?