100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual)

Rating
-
Sold
-
Pages
182
Grade
A+
Uploaded on
15-11-2021
Written in
2021/2022

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual) includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party soft ware. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download soft ware (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse-scale computer: Th ousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system. 1 .2 a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design 1 .3 Th e program is compiled into an assembly language program, which is then assembled into a machine language program. 1 .4 a. 1280 × 1024 pixels = 1,310,720 pixels = > 1,310,720 × 3 = 3,932,160 bytes/ frame. b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds 1 .5 a. performance of P1 (instructions/sec) = 3 × 10 9 /1.5 = 2 × 10 9 performance of P2 (instructions/sec) = 2.5 × 10 9 /1.0 = 2.5 × 10 9 performance of P3 (instructions/sec) = 4 × 10 9 /2.2 = 1.8 × 10 9 S-4 Chapter 1 Solutions b. cycles(P1) = 10 × 3 × 10 9 = 30 × 10 9 s cycles(P2) = 10 × 2.5 × 10 9 = 25 × 10 9 s cycles(P3) = 10 × 4 × 10 9 = 40 × 10 9 s c. No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 10 9 No. instructions(P2) = 25 × 10 9 /1 = 25 × 10 9 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 10 9 CPI new = CPI old × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6 f = No. instr. × CPI/time, then f (P1) = 20 × 10 9 × 1.8/7 = 5.14 GHz f (P2) = 25 × 10 9 × 1.2/7 = 4.28 GHz f (P1) = 18.18 × 10 9 × 2.6/7 = 6.75 GHz 1 .6 a. Class A: 10 5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 10 5 instr. Time = No. instr. × CPI/clock rate Total time P1 = (10 5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3)/(2.5 × 10 9 ) = 10.4 × 10 −4 s Total time P2 = (10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2)/(3 × 10 9 ) = 6.66 × 10 −4 s CPI(P1) = 10.4 × 10 −4 × 2.5 × 10 9 /10 6 = 2.6 CPI(P2) = 6.66 × 10 −4 × 3 × 10 9 /10 6 = 2.0 b. clock cycles(P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3 = 26 × 10 5 clock cycles(P2) = 10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2 = 20 × 10 5 1 .7 a. CPI = T exec × f/No. instr. Compiler A CPI = 1.1 Compiler B CPI = 1.25 b. f B /f A = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37 c. T A /T new = 1.67 T B /T new = 2.27 Chapter 1 Solutions S-5 1 .8 1.8.1 C = 2 × DP/(V 2 × F) Pentium 4: C = 3.2E–8F Core i5 Ivy Bridge: C = 2.9E–8F 1.8.2 Pentium 4: 10/100 = 10% Core i5 Ivy Bridge: 30/70 = 42.9% 1.8.3 (S new + D new )/(S old + D old ) = 0.90 D new = C × V new 2 × F S old = V old × I S new = V new × I Therefore: V new = [D new /(C × F)] 1/2 D new = 0.90 × (S old + D old ) − S new S new = V new × (S old /V old ) Pentium 4: S new = V new × (10/1.25) = V new × 8 D new = 0.90 × 100 − V new × 8 = 90 − V new × 8 V new = [(90 − V new × 8)/(3.2E8 × 3.6E9)] 1/2 V new = 0.85 V Core i5: S new = V new × (30/0.9) = V new × 33.3 D new = 0.90 × 70 − V new × 33.3 = 63 − V new × 33.3 V new = [(63 − V new × 33.3)/(2.9E8 × 3.4E9)] 1/2 V new = 0.64 V 1 .9 1.9.1 S-6 Chapter 1 Solutions 1.9.2 1.9.3 3 1.10 1.10.1 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /84 = 2.10 cm 2 yield 15cm = 1/(1 + (0.020 × 2.10/2)) 2 = 0.9593 die area 20cm = wafer area/dies per wafer = π × 10 2 /100 = 3.14 cm 2 yield 20cm = 1/(1 + (0.031 × 3.14/2)) 2 = 0.9093 1.10.2 cost/die 15cm = 12/(84 × 0.9593) = 0.1489 cost/die 20cm = 15/(100 × 0.9093) = 0.1650 1.10.3 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /(84 × 1.1) = 1.91 cm 2 yield 15cm = 1/(1 + (0.020 × 1.15 × 1.91/2)) 2 = 0.9575 die area 20cm = wafer area/dies per wafer = π × 10 2 /(100 × 1.1) = 2.86 cm 2 yield 20cm = 1/(1 + (0.03 × 1.15 × 2.86/2)) 2 = 0.9082 1.10.4 defects per area 0.92 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.92 .5 )/ (0.92 .5 × 2/2) = 0.043 defects/cm 2 defects per area 0.95 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.95 .5 )/ (0.95 .5 × 2/2) = 0.026 defects/cm 2 1 .11 1 .11.1 CPI = clock rate × C PU time/instr. count clock rate = 1/cycle time = 3 GHz CPI(bzip2) = 3 × 10 9 × 750/(2389 × 10 9 ) = 0.94 1 .11.2 SPEC ratio = ref. time/execution time SPEC ratio(bzip2) = 9 650/750 = 12.86 1 .11.3 CPU time = N o. instr. × C PI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the number of instructions, that is 10%. Chapter 1 Solutions S-7 1 .11.4 CPU time(before) = No. instr. × CPI/clock rate CPU time(aft er) = 1.1 × No. instr. × 1.05 × CPI/clock rate CPU time(aft er)/CPU time(before) = 1.1 × 1.05 = 1.155. Th us, CPU time is increased by 15.5%. 1 .11.5 SPECratio = reference time/CPU time SPECratio(aft er)/SPECratio(before) = CPU time(before)/CPU time(aft er) = 1/1.1555 = 0.86. Th e SPECratio is decreased by 14%. 1 .11.6 CPI = ( CPU time × c lock rate)/No. instr. CPI = 700 × 4 × 10 9 /(0.85 × 2389 × 10 9 ) = 1.37 1 .11.7 Clock rate ratio = 4 GHz/3 GHz = 1 .33 CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1.45 Th ey are diff erent because, although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage. 1 .11.8 700/750 = 0.933. CPU time reduction: 6.7% 1 .11.9 No. instr. = C PU time × clock rate/CPI No. instr. = 960 × 0.9 × 4 × 10 9 /1.61 = 2146 × 10 9 1 .11.10 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × CPI/0.9 × CPU time = 1/0.9 clock rate old = 3.33 GHz 1 .11.11 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × 0.85 × CPI/0.80 CPU time = 0.85/0.80, clock rate old = 3.18 GHz 1 .12 1.12.1 T(P1) = 5 × 10 9 × 0.9/(4 × 10 9 ) = 1.125 s T(P2) = 10 9 × 0.75/(3 × 10 9 ) = 0.25 s clock rate(P1) > clock rate(P2), performance(P1) < performance(P2) 1.12.2 T(P1) = No. instr. × CPI/clock rate T(P1) = 2.25 3 1021 s T(P2) 5 N × 0.75/(3 × 10 9 ), then N = 9 × 10 8 1.12.3 MIPS = Clock rate × 10 −6 /CPI MIPS(P1) = 4 × 10 9 × 10 −6 /0.9 = 4.44 × 10 3 S-8 Chapter 1 Solutions MIPS(P2) = 3 × 10 9 × 10 −6 /0.75 = 4.0 × 10 3 MIPS(P1) > MIPS(P2), performance(P1) < performance(P2) (from 11a) 1.12.4 MFLOPS = No. FP operations × 10 −6 /T MFLOPS(P1) = .4 × 5E9 × 1E-6/1.125 = 1.78E3 MFLOPS(P2) = .4 × 1E9 × 1E-6/.25 = 1.60E3 MFLOPS(P1) > MFLOPS(P2), performance(P1) < performance(P2) (from 11a) 1 .13 1.13.1 T fp = 70 × 0.8 = 56 s. T new = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% 1.13.2 T new = 250 × 0.8 = 200 s, T fp + T l/s + T branch = 165 s, T int = 35 s. Reduction time INT: 58.8% 1.13.3 T new = 250 × 0.8 = 200 s, T fp + T int + T l/s = 210 s. NO 1 .14 1.14.1 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 clock cycles = 512 × 10 6 ; T CPU = 0.256 s To have the number of clock cycles by improving the CPI of FP instructions: CPI improved fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved fp = (clock cycles/2 − (CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr.)) / No. FP instr. CPI improved fp = (256 − 4 62)/50 < 0 = = > not possible 1.14.2 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI improved l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved l/s = (clock cycles/2 − (CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI branch × No. branch instr.)) / No. L/S instr. CPI improved l/s = (256 − 198)/80 = 0.725 1.14.3 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. Chapter 1 Solutions S-9 T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 CPI int = 0.6 × 1 = 0.6; CPI fp = 0.6 × 1 = 0.6; CPI l/s = 0.7 × 4 = 2.8; CPI branch = 0.7 × 2 = 1.4 T CPU (before improv.) = 0.256 s; T CPU (aft er improv.) = 0.171 s 1 .15 Solutions 2 Chapter 2 Solutions S-3 2.1 a ddi x5, x7,-5 a dd x 5, x5, x6 [addi f,h,-5 (note, no subi) add f,f,g] 2 .2 f = g+h+i 2.3 s ub x 30, x28, x29 // compute i-j slli x30, x30, 3 // multiply by 8 to convert the word offset to a byte offset ld x30, 0(x3) // load A[i-j] sd x30, 64(x11) // store in B[8] 2.4 B [g]= A[f] + A[f+1] slli x30, x5, 3 // x30 = f*8 add x30, x10, x30 // x30 = &A[f] slli x31, x6, 3 // x31 = g*8 add x31, x11, x31 // x31 = &B[g] ld x5, 0(x30) // f = A[f] addi x12, x30, 8 // x12 = &A[f]+8 (i.e. &A[f+1]) ld x30, 0(x12) // x30 = A[f+1] add x30, x30, x5 // x30 = A[f+1] + A[f] sd x30, 0(x31) // B[g] = x30 (i.e. A[f+1] + A[f]) 2.5 2.6 8 2.7 slli x28, x28, 3 // x28 = i*8 ld x28, 0(x10) // x28 = A[i] slli x29, x29, 3 // x29 = j*8 ld x29, 0(x11) // x29 = B[j] add x29, x28, x29 // Compute x29 = A[i] + B[j] sd x29, 64(x11) // Store result in B[8] S-4 Chapter 2 Solutions 2.8 f = 2*(&A) addi x30, x10, 8 // x30 = &A[1] addi x31, x10, 0 // x31 = &A sd x31, 0(x30) // A[1] = &A ld x30, 0(x30) // x30 = A[1] = &A add x5, x30, x31 // f = &A + &A = 2*(&A) 2.9 addi x30,x10,8 addi x31,x10,0 sd x31,0(x30) ld x30,0(x30) add x5, x30, x31 I-type 0x13, 0x0, -- 0x13, 0x0, -- 0x23, 0x3, -- 0x3, 0x3, -- 0x33, 0x0, 0x0 R-type R-type S-type I-type 10 -- 30 30 5 31 0 8 0 0 -- -- -- -- 30 31 10 31 30 30 opcode, type funct3,7 rs1 rs2 rd imm 2.10 2.10.1 0x 2.10.2 overflow 2.10.3 0xB 2.10.4 no overflow 2.10.5 0xD 2.10.6 overflow 2.11 2.11.1 Th ere is an overfl ow if 128 + x6 > 2 63 − 1. In other words, if x6 > 2 63 − 129. Th ere is also an overfl ow if 128 + x6 < −2 63 . In other words, if x6 < −2 63 − 128 (which is impossible given the range of x6 ). 2.11.2 Th ere is an overfl ow if 128 – x6 > 2 63 − 1. In other words, if x6 < −2 63 + 129. Th ere is also an overfl ow if 128 – x6 < −2 63 . In other words, if x6 > 2 63 + 128 (which is impossible given the range of x6 ). 2.11.3 Th ere is an overfl ow if x6 − 128 > 2 63 − 1. In other words, if x6 < 2 63 + 127 (which is impossible given the range of x6 ). Th ere is also an overfl ow if x6 − 128 < −2 63 . In other words, if x6 < −2 63 + 128. 2 .12 R -type: add x1, x1, x1 Chapter 2 Solutions S-5 2.13 S -type: 0x25F3023 ( 0011) 2.14 R -type: sub x6, x7, x5 (0x: ) 2 .15 I -type: ld x3, 4(x27) (0x4DB183: ) 2.16 2.16.1 Th e opcode would expand from 7 bits to 9. Th e rs1 , rs2 , and rd fi elds would increase from 5 bits to 7 bits. 2.16.2 Th e opcode would expand from 7 bits to 12. Th e rs1 and rd fi elds would increase from 5 bits to 7 bits. Th is change does not aff ect the imm fi eld per se , but it might force the ISA designer to consider shortening the immediate fi eld to avoid an increase in overall instruction size. 2.16.3 * Increasing the size of each bit fi eld potentially makes each instruction longer, potentially increasing the code size overall. * However, increasing the number of registers could lead to less register spillage, which would reduce the total number of instructions, possibly reducing the code size overall. 2.17 2.17.1 0xababefef8 2.17.2 0x 2.17.3 0x545 2 .18 It can be done in eight RISC-V instructions: addi x7, x0, 0x3f // Create bit mask for bits 16 to 11 slli x7, x7, 11 // Shift the masked bits and x28, x5, x7 // Apply the mask to x5 slli x7, x6, 15 // Shift the mask to cover bits 31 to 26 xori x7, x7, -1 // This is a NOT operation and x6, x6, x7 // “Zero out” positions 31 to 26 of x6 slli x28, x28, 15 // Move selection from x5 into positions 31 to 26 or x6, x6, x28 // Load bits 31 to 26 from x28 2 .19 x ori x5, x6, -1 S-6 Chapter 2 Solutions 2.20 l d x 6, 0(x17) slli x6, x6, 4 2 .21 x 6 = 2 2.22 2.22.1 [0x1ff00000, 0x200FFFFE] 2.22.2 [0x1FFFF000, 0x20000ffe] 2.23 2.23.1 Th e UJ instruction format would be most appropriate because it would allow the maximum number of bits possible for the “ loop ” parameter, thereby maximizing the utility of the instruction. 2.23.2 It can be done in three instructions: loop: addi x29, x29, -1 // Subtract 1 from x29 bgt x29, x0, loop // Continue if x29 not negative addi x29, x29, 1 // Add back 1 that shouldn’t have been subtracted. 2.24 2.24.1 Th e fi nal value of xs is 20 . 2.24.2 acc = 0; i = 10; w hile (i ! = 0) { a cc += 2; i --; } 2 .24.3 4 *N + 1 instructions. 2.24.4 (Note: change condition ! = to > = in the while loop) a cc = 0; i = 10; w hile (i >= 0) { a cc += 2; i --; } Chapter 2 Solutions S-7 2 .25 Th e C code can be implemented in RISC-V assembly as follows. L OOPI: addi x7, x0, 0 // Init i = 0 bge x7, x5, ENDI // While i < a addi x30, x10, 0 // x30 = &D addi x29, x0, 0 // Init j = 0 LOOPJ: bge x29, x6, ENDJ // While j < b add x31, x7, x29 // x31 = i+j sd x31, 0(x30) // D[4*j] = x31 addi x30, x30, 32 // x30 = &D[4*(j+1)] addi x29, x29, 1 // j++ jal x0, LOOPJ ENDJ: addi x7, x7, 1 // i++; jal x0, LOOPI ENDI: 2 .26 Th e code requires 13 RISC-V instructions. When a = 10 and b = 1, this results in 123 instructions being executed. 2.27 / / This C code corresponds most directly to the given assembly. int i; for (i = 0; i < 100; i++) { result += *MemArray; MemArray++; } return result; // However, many people would write the code this way: int i; for (i = 0; i < 100; i++) { result += MemArray[i]; } return result; S-8 Chapter 2 Solutions 2.28 T he address of the last element of MemArray can be used to terminate the loop: add x29, x10, 800 // x29 = &MemArray[101] LOOP: ld x7, 0(x10) add x5, x5, x7 addi x10, x10, 8 blt x10, x29, LOOP // Loop until MemArray points to one-past the last element 2.29 // IMPORTANT! Stack pointer must reamin a multiple of 16!!!! fib: beq x10, x0, done // If n==0, return 0 addi x5, x0, 1 beq x10, x5, done // If n==1, return 1 addi x2, x2, -16 // Allocate 2 words of stack space sd x1, 0(x2) // Save the return address sd x10, 8(x2) // Save the current n addi x10, x10, -1 // x10 = n-1 jal x1, fib // fib(n-1) ld x5, 8(x2) // Load old n from the stack sd x10, 8(x2) // Push fib(n-1) onto the stack addi x10, x5, -2 // x10 = n-2 jal x1, fib // Call fib(n-2) ld x5, 8(x2) // x5 = fib(n-1) add x10, x10, x5 // x10 = fib(n-1)+fib(n-2) // Clean up: ld x1, 0(x2) // Load saved return address addi x2, x2, 16 // Pop two words from the stack done: jalr x0, x1 2 .30 [answers will vary] Chapter 2 Solutions S-9 2.31 / / IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11= c+d from the stack jal x1, g // Call x10 = g(g(a,b), c+d) ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jalr x0, x1 2 .32 We can use the tail-call optimization for the second call to g , saving one instruction: // IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11 = c+d from the stack ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jal x0, g // Call x10 = g(g(a,b), c+d) 2 .33 *We have no idea what the contents of x 10-x14 are, g can set them as it pleases. *We don’t know what the precise contents of x8 and sp are; but we do know that they are identical to the contents when f was called. *Similarly, we don’t know what the precise contents of x1 are; but, we do know that it is equal to the return address set by the “ jal x1, f ” instruction that invoked f . S-10 Chapter 2 Solutions 2.34 a_to_i: addi x28, x0, 10 # Just stores the constant 10 addi x29, x0, 0 # Stores the running total addi x5, x0, 1 # Tracks whether input is positive or negative # Test for initial ‘+’ or ‘-’ lbu x6, 0(x10) # Load the first character addi x7, x0, 45 # ASCII ‘-’ bne x6, x7, noneg addi x5, x0, -1 # Set that input was negative addi x10, x10, 1 # str++ jal x0, main_atoi_loop noneg: addi x7, x0, 43 # ASCII ‘+’ bne x6, x7, main_atoi_loop addi x10, x10, 1 # str++ main_atoi_loop: lbu x6, 0(x10) # Load the next digit beq x6, x0, done # Make sure next char is a digit, or fail addi x7, x0, 48 # ASCII ‘0’ sub x6, x6, x7 blt x6, x0, fail # *str < ‘0’ bge x6, x28, fail # *str >= ‘9’ # Next char is a digit, so accumulate it into x29 mul x29, x29, x28 # x29 *= 10 add x29, x29, x6 # x29 += *str - ‘0’ addi x10, x10, 1 # str++ jal x0, main_atoi_loop done: addi x10, x29, 0 # Use x29 as output value mul x10, x10, x5 # Multiply by sign jalr x0, x1 # Return result fail: addi x10, x0, -1 jalr x0, x1 2.35 2.35.1 0x11 2 .35.2 0 x88 Chapter 2 Solutions S-11 2 .36 l ui x 10, 0 x11223 addi x10, x10, 0x344 slli x10, x10, 32 lui x5, 0x55667 addi x5, x5, 0x788 add x10, x10, x5 2.37 setmax: try: lr.d x5, (x10) # Load-reserve *shvar bge x5, x11, release # Skip update if *shvar > x addi x5, x11, 0 release: sc.d x7, x5, (x10) bne x7, x0, try # If store-conditional failed, try again jalr x0, x1 2 .38 W hen two processors A and B begin executing this loop at the same time, at most one of them will execute the store-conditional instruction successfully, while the other will be forced to retry the loop. If processor A’s store-conditional successds initially, then B will re-enter the try block, and it will see the new value of shvar written by A when it fi nally succeeds. Th e hardware guarantees that both processors will eventually execute the code completely. 2.39 2.39.1 No. Th e resulting machine would be slower overall. Current CPU requires (num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 500*1 + 300*10 + 100*3 = 3800 cycles. Th e new CPU requires (.75*num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 375*1 + 300*10 + 100*3 = 3675 cycles. However, given that each of the new CPU’s cycles is 10% longer than the original CPU’s cycles, the new CPU’s 3675 cycles will take as long as 4042.5 cycles on the original CPU. 2.39.2 If we double the performance of arithmetic instructions by reducing their CPI to 0.5, then the the CPU will run the reference program in (500*.5) + (300*10) + 100*3 = 3550 cycles. Th is represents a speedup of 1.07. If we improve the performance of arithmetic instructions by a factor of 10 (reducing their CPI to 0.1), then the the CPU will run the reference program in (500*.1) + (300*10) + 100*3 = 3350 cycles. Th is represents a speedup of 1.13. S-12 Chapter 2 Solutions 2.40 2.40.1 Take the weighted average: 0.7*2 + 0.1*6 + 0.2*3 = 2.6 2.40.2 For a 25% improvement, we must reduce the CPU to 2.6*.75 = 1.95. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.95. Solving for x shows that the arithmetic instructions must have a CPI of at most 1.07. 2.40.3 For a 50% improvement, we must reduce the CPU to 2.6*.5 = 1.3. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.3. Solving for x shows that the arithmetic instructions must have a CPI of at most 0.14 2 .41 ldr x28, x5(x10), 3 // Load x28=A[f] addi x5, x5, 1 // f++ ldr x29, x5(x10), 3 // Load x29=A[f+1] add x29, x29, x28 // Add x29 = A[f] + A[f+1] sdr x12, x6(x11), 3 // Store B[g] = x29 2.42 l dr x 28, x28, (x10), 3 / / Load x28=A[i] ldr x29, x29, (x11), 3 // Load x29=B[j] add x29, x28, x29 sd x29, 64(x11) // Store B[8]=x29 (don’t need scaled store here) Solutions 3

Show more Read less











Whoops! We can’t load your doc right now. Try again or contact support.

Document information

Uploaded on
November 15, 2021
Number of pages
182
Written in
2021/2022
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

, Chapter 1 Solutions S-3



1.1 Personal computer (includes workstation and laptop): Personal computers
emphasize delivery of good performance to single users at low cost and usually
execute third-party software.
Personal mobile device (PMD, includes tablets): PMDs are battery operated
with wireless connectivity to the Internet and typically cost hundreds of
dollars, and, like PCs, users can download software (“apps”) to run on them.
Unlike PCs, they no longer have a keyboard and mouse, and are more likely
to rely on a touch-sensitive screen or even speech input.
Server: Computer used to run large problems and usually accessed via a
network.
Warehouse-scale computer: Thousands of processors forming a large cluster.
Supercomputer: Computer composed of hundreds to thousands of processors
and terabytes of memory.
Embedded computer: Computer designed to run one application or one set
of related applications and integrated into a single system.

1.2
a. Performance via Pipelining
b. Dependability via Redundancy
c. Performance via Prediction
d. Make the Common Case Fast
e. Hierarchy of Memories
f. Performance via Parallelism
g. Design for Moore’s Law
h. Use Abstraction to Simplify Design

1.3 The program is compiled into an assembly language program, which is then
assembled into a machine language program.

1.4
a. 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160 bytes/
frame.
b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds

1.5
a. performance of P1 (instructions/sec) = 3 × 109/1.5 = 2 × 109
performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109
performance of P3 (instructions/sec) = 4 × 109/2.2 = 1.8 × 109

,S-4 Chapter 1 Solutions



b. cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
c. No. instructions(P1) = 30 × 109/1.5 = 20 × 109
No. instructions(P2) = 25 × 109/1 = 25 × 109
No. instructions(P3) = 40 × 109/2.2 = 18.18 × 109
CPInew = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6
f = No. instr. × CPI/time, then
f(P1) = 20 × 109 × 1.8/7 = 5.14 GHz
f(P2) = 25 × 109 × 1.2/7 = 4.28 GHz
f(P1) = 18.18 × 109 × 2.6/7 = 6.75 GHz

1.6
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 × 105
instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109) =
10.4 × 10−4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109) =
6.66 × 10−4 s
CPI(P1) = 10.4 × 10−4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10−4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105

1.7
a. CPI = Texec × f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
b. fB/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. TA/Tnew = 1.67
TB/Tnew = 2.27

, Chapter 1 Solutions S-5



1.8
1.8.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.8.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.8.3 (Snew + Dnew)/(Sold + Dold) = 0.90
Dnew = C × Vnew 2 × F
Sold = Vold × I
Snew = Vnew × I
Therefore:
Vnew = [Dnew/(C × F)]1/2
Dnew = 0.90 × (Sold + Dold) − Snew
Snew = Vnew × (Sold/Vold)
Pentium 4:
Snew = Vnew × (10/1.25) = Vnew × 8
Dnew = 0.90 × 100 − Vnew × 8 = 90 − Vnew × 8
Vnew = [(90 − Vnew × 8)/(3.2E8 × 3.6E9)]1/2
Vnew = 0.85 V
Core i5:
Snew = Vnew × (30/0.9) = Vnew × 33.3
Dnew = 0.90 × 70 − Vnew × 33.3 = 63 − Vnew × 33.3
Vnew = [(63 − Vnew × 33.3)/(2.9E8 × 3.4E9)]1/2
Vnew = 0.64 V

1.9
1.9.1

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
Expert001 Chamberlain School Of Nursing
View profile
Follow You need to be logged in order to follow users or courses
Sold
795
Member since
4 year
Number of followers
566
Documents
1190
Last sold
1 day ago
Expert001

High quality, well written Test Banks, Guides, Solution Manuals and Exams to enhance your learning potential and take your grades to new heights. Kindly leave a review and suggestions. We do take pride in our high-quality services and we are always ready to support all clients.

4.2

159 reviews

5
104
4
18
3
14
2
7
1
16

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions