Garantie de satisfaction à 100% Disponible immédiatement après paiement En ligne et en PDF Tu n'es attaché à rien 4,6 TrustPilot
logo-home
Examen

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual)

Note
-
Vendu
-
Pages
182
Grade
A+
Publié le
15-11-2021
Écrit en
2021/2022

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual) includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party soft ware. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download soft ware (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse-scale computer: Th ousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system. 1 .2 a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design 1 .3 Th e program is compiled into an assembly language program, which is then assembled into a machine language program. 1 .4 a. 1280 × 1024 pixels = 1,310,720 pixels = > 1,310,720 × 3 = 3,932,160 bytes/ frame. b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds 1 .5 a. performance of P1 (instructions/sec) = 3 × 10 9 /1.5 = 2 × 10 9 performance of P2 (instructions/sec) = 2.5 × 10 9 /1.0 = 2.5 × 10 9 performance of P3 (instructions/sec) = 4 × 10 9 /2.2 = 1.8 × 10 9 S-4 Chapter 1 Solutions b. cycles(P1) = 10 × 3 × 10 9 = 30 × 10 9 s cycles(P2) = 10 × 2.5 × 10 9 = 25 × 10 9 s cycles(P3) = 10 × 4 × 10 9 = 40 × 10 9 s c. No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 10 9 No. instructions(P2) = 25 × 10 9 /1 = 25 × 10 9 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 10 9 CPI new = CPI old × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6 f = No. instr. × CPI/time, then f (P1) = 20 × 10 9 × 1.8/7 = 5.14 GHz f (P2) = 25 × 10 9 × 1.2/7 = 4.28 GHz f (P1) = 18.18 × 10 9 × 2.6/7 = 6.75 GHz 1 .6 a. Class A: 10 5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 10 5 instr. Time = No. instr. × CPI/clock rate Total time P1 = (10 5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3)/(2.5 × 10 9 ) = 10.4 × 10 −4 s Total time P2 = (10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2)/(3 × 10 9 ) = 6.66 × 10 −4 s CPI(P1) = 10.4 × 10 −4 × 2.5 × 10 9 /10 6 = 2.6 CPI(P2) = 6.66 × 10 −4 × 3 × 10 9 /10 6 = 2.0 b. clock cycles(P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3 = 26 × 10 5 clock cycles(P2) = 10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2 = 20 × 10 5 1 .7 a. CPI = T exec × f/No. instr. Compiler A CPI = 1.1 Compiler B CPI = 1.25 b. f B /f A = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37 c. T A /T new = 1.67 T B /T new = 2.27 Chapter 1 Solutions S-5 1 .8 1.8.1 C = 2 × DP/(V 2 × F) Pentium 4: C = 3.2E–8F Core i5 Ivy Bridge: C = 2.9E–8F 1.8.2 Pentium 4: 10/100 = 10% Core i5 Ivy Bridge: 30/70 = 42.9% 1.8.3 (S new + D new )/(S old + D old ) = 0.90 D new = C × V new 2 × F S old = V old × I S new = V new × I Therefore: V new = [D new /(C × F)] 1/2 D new = 0.90 × (S old + D old ) − S new S new = V new × (S old /V old ) Pentium 4: S new = V new × (10/1.25) = V new × 8 D new = 0.90 × 100 − V new × 8 = 90 − V new × 8 V new = [(90 − V new × 8)/(3.2E8 × 3.6E9)] 1/2 V new = 0.85 V Core i5: S new = V new × (30/0.9) = V new × 33.3 D new = 0.90 × 70 − V new × 33.3 = 63 − V new × 33.3 V new = [(63 − V new × 33.3)/(2.9E8 × 3.4E9)] 1/2 V new = 0.64 V 1 .9 1.9.1 S-6 Chapter 1 Solutions 1.9.2 1.9.3 3 1.10 1.10.1 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /84 = 2.10 cm 2 yield 15cm = 1/(1 + (0.020 × 2.10/2)) 2 = 0.9593 die area 20cm = wafer area/dies per wafer = π × 10 2 /100 = 3.14 cm 2 yield 20cm = 1/(1 + (0.031 × 3.14/2)) 2 = 0.9093 1.10.2 cost/die 15cm = 12/(84 × 0.9593) = 0.1489 cost/die 20cm = 15/(100 × 0.9093) = 0.1650 1.10.3 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /(84 × 1.1) = 1.91 cm 2 yield 15cm = 1/(1 + (0.020 × 1.15 × 1.91/2)) 2 = 0.9575 die area 20cm = wafer area/dies per wafer = π × 10 2 /(100 × 1.1) = 2.86 cm 2 yield 20cm = 1/(1 + (0.03 × 1.15 × 2.86/2)) 2 = 0.9082 1.10.4 defects per area 0.92 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.92 .5 )/ (0.92 .5 × 2/2) = 0.043 defects/cm 2 defects per area 0.95 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.95 .5 )/ (0.95 .5 × 2/2) = 0.026 defects/cm 2 1 .11 1 .11.1 CPI = clock rate × C PU time/instr. count clock rate = 1/cycle time = 3 GHz CPI(bzip2) = 3 × 10 9 × 750/(2389 × 10 9 ) = 0.94 1 .11.2 SPEC ratio = ref. time/execution time SPEC ratio(bzip2) = 9 650/750 = 12.86 1 .11.3 CPU time = N o. instr. × C PI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the number of instructions, that is 10%. Chapter 1 Solutions S-7 1 .11.4 CPU time(before) = No. instr. × CPI/clock rate CPU time(aft er) = 1.1 × No. instr. × 1.05 × CPI/clock rate CPU time(aft er)/CPU time(before) = 1.1 × 1.05 = 1.155. Th us, CPU time is increased by 15.5%. 1 .11.5 SPECratio = reference time/CPU time SPECratio(aft er)/SPECratio(before) = CPU time(before)/CPU time(aft er) = 1/1.1555 = 0.86. Th e SPECratio is decreased by 14%. 1 .11.6 CPI = ( CPU time × c lock rate)/No. instr. CPI = 700 × 4 × 10 9 /(0.85 × 2389 × 10 9 ) = 1.37 1 .11.7 Clock rate ratio = 4 GHz/3 GHz = 1 .33 CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1.45 Th ey are diff erent because, although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage. 1 .11.8 700/750 = 0.933. CPU time reduction: 6.7% 1 .11.9 No. instr. = C PU time × clock rate/CPI No. instr. = 960 × 0.9 × 4 × 10 9 /1.61 = 2146 × 10 9 1 .11.10 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × CPI/0.9 × CPU time = 1/0.9 clock rate old = 3.33 GHz 1 .11.11 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × 0.85 × CPI/0.80 CPU time = 0.85/0.80, clock rate old = 3.18 GHz 1 .12 1.12.1 T(P1) = 5 × 10 9 × 0.9/(4 × 10 9 ) = 1.125 s T(P2) = 10 9 × 0.75/(3 × 10 9 ) = 0.25 s clock rate(P1) > clock rate(P2), performance(P1) < performance(P2) 1.12.2 T(P1) = No. instr. × CPI/clock rate T(P1) = 2.25 3 1021 s T(P2) 5 N × 0.75/(3 × 10 9 ), then N = 9 × 10 8 1.12.3 MIPS = Clock rate × 10 −6 /CPI MIPS(P1) = 4 × 10 9 × 10 −6 /0.9 = 4.44 × 10 3 S-8 Chapter 1 Solutions MIPS(P2) = 3 × 10 9 × 10 −6 /0.75 = 4.0 × 10 3 MIPS(P1) > MIPS(P2), performance(P1) < performance(P2) (from 11a) 1.12.4 MFLOPS = No. FP operations × 10 −6 /T MFLOPS(P1) = .4 × 5E9 × 1E-6/1.125 = 1.78E3 MFLOPS(P2) = .4 × 1E9 × 1E-6/.25 = 1.60E3 MFLOPS(P1) > MFLOPS(P2), performance(P1) < performance(P2) (from 11a) 1 .13 1.13.1 T fp = 70 × 0.8 = 56 s. T new = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% 1.13.2 T new = 250 × 0.8 = 200 s, T fp + T l/s + T branch = 165 s, T int = 35 s. Reduction time INT: 58.8% 1.13.3 T new = 250 × 0.8 = 200 s, T fp + T int + T l/s = 210 s. NO 1 .14 1.14.1 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 clock cycles = 512 × 10 6 ; T CPU = 0.256 s To have the number of clock cycles by improving the CPI of FP instructions: CPI improved fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved fp = (clock cycles/2 − (CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr.)) / No. FP instr. CPI improved fp = (256 − 4 62)/50 < 0 = = > not possible 1.14.2 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI improved l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved l/s = (clock cycles/2 − (CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI branch × No. branch instr.)) / No. L/S instr. CPI improved l/s = (256 − 198)/80 = 0.725 1.14.3 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. Chapter 1 Solutions S-9 T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 CPI int = 0.6 × 1 = 0.6; CPI fp = 0.6 × 1 = 0.6; CPI l/s = 0.7 × 4 = 2.8; CPI branch = 0.7 × 2 = 1.4 T CPU (before improv.) = 0.256 s; T CPU (aft er improv.) = 0.171 s 1 .15 Solutions 2 Chapter 2 Solutions S-3 2.1 a ddi x5, x7,-5 a dd x 5, x5, x6 [addi f,h,-5 (note, no subi) add f,f,g] 2 .2 f = g+h+i 2.3 s ub x 30, x28, x29 // compute i-j slli x30, x30, 3 // multiply by 8 to convert the word offset to a byte offset ld x30, 0(x3) // load A[i-j] sd x30, 64(x11) // store in B[8] 2.4 B [g]= A[f] + A[f+1] slli x30, x5, 3 // x30 = f*8 add x30, x10, x30 // x30 = &A[f] slli x31, x6, 3 // x31 = g*8 add x31, x11, x31 // x31 = &B[g] ld x5, 0(x30) // f = A[f] addi x12, x30, 8 // x12 = &A[f]+8 (i.e. &A[f+1]) ld x30, 0(x12) // x30 = A[f+1] add x30, x30, x5 // x30 = A[f+1] + A[f] sd x30, 0(x31) // B[g] = x30 (i.e. A[f+1] + A[f]) 2.5 2.6 8 2.7 slli x28, x28, 3 // x28 = i*8 ld x28, 0(x10) // x28 = A[i] slli x29, x29, 3 // x29 = j*8 ld x29, 0(x11) // x29 = B[j] add x29, x28, x29 // Compute x29 = A[i] + B[j] sd x29, 64(x11) // Store result in B[8] S-4 Chapter 2 Solutions 2.8 f = 2*(&A) addi x30, x10, 8 // x30 = &A[1] addi x31, x10, 0 // x31 = &A sd x31, 0(x30) // A[1] = &A ld x30, 0(x30) // x30 = A[1] = &A add x5, x30, x31 // f = &A + &A = 2*(&A) 2.9 addi x30,x10,8 addi x31,x10,0 sd x31,0(x30) ld x30,0(x30) add x5, x30, x31 I-type 0x13, 0x0, -- 0x13, 0x0, -- 0x23, 0x3, -- 0x3, 0x3, -- 0x33, 0x0, 0x0 R-type R-type S-type I-type 10 -- 30 30 5 31 0 8 0 0 -- -- -- -- 30 31 10 31 30 30 opcode, type funct3,7 rs1 rs2 rd imm 2.10 2.10.1 0x 2.10.2 overflow 2.10.3 0xB 2.10.4 no overflow 2.10.5 0xD 2.10.6 overflow 2.11 2.11.1 Th ere is an overfl ow if 128 + x6 > 2 63 − 1. In other words, if x6 > 2 63 − 129. Th ere is also an overfl ow if 128 + x6 < −2 63 . In other words, if x6 < −2 63 − 128 (which is impossible given the range of x6 ). 2.11.2 Th ere is an overfl ow if 128 – x6 > 2 63 − 1. In other words, if x6 < −2 63 + 129. Th ere is also an overfl ow if 128 – x6 < −2 63 . In other words, if x6 > 2 63 + 128 (which is impossible given the range of x6 ). 2.11.3 Th ere is an overfl ow if x6 − 128 > 2 63 − 1. In other words, if x6 < 2 63 + 127 (which is impossible given the range of x6 ). Th ere is also an overfl ow if x6 − 128 < −2 63 . In other words, if x6 < −2 63 + 128. 2 .12 R -type: add x1, x1, x1 Chapter 2 Solutions S-5 2.13 S -type: 0x25F3023 ( 0011) 2.14 R -type: sub x6, x7, x5 (0x: ) 2 .15 I -type: ld x3, 4(x27) (0x4DB183: ) 2.16 2.16.1 Th e opcode would expand from 7 bits to 9. Th e rs1 , rs2 , and rd fi elds would increase from 5 bits to 7 bits. 2.16.2 Th e opcode would expand from 7 bits to 12. Th e rs1 and rd fi elds would increase from 5 bits to 7 bits. Th is change does not aff ect the imm fi eld per se , but it might force the ISA designer to consider shortening the immediate fi eld to avoid an increase in overall instruction size. 2.16.3 * Increasing the size of each bit fi eld potentially makes each instruction longer, potentially increasing the code size overall. * However, increasing the number of registers could lead to less register spillage, which would reduce the total number of instructions, possibly reducing the code size overall. 2.17 2.17.1 0xababefef8 2.17.2 0x 2.17.3 0x545 2 .18 It can be done in eight RISC-V instructions: addi x7, x0, 0x3f // Create bit mask for bits 16 to 11 slli x7, x7, 11 // Shift the masked bits and x28, x5, x7 // Apply the mask to x5 slli x7, x6, 15 // Shift the mask to cover bits 31 to 26 xori x7, x7, -1 // This is a NOT operation and x6, x6, x7 // “Zero out” positions 31 to 26 of x6 slli x28, x28, 15 // Move selection from x5 into positions 31 to 26 or x6, x6, x28 // Load bits 31 to 26 from x28 2 .19 x ori x5, x6, -1 S-6 Chapter 2 Solutions 2.20 l d x 6, 0(x17) slli x6, x6, 4 2 .21 x 6 = 2 2.22 2.22.1 [0x1ff00000, 0x200FFFFE] 2.22.2 [0x1FFFF000, 0x20000ffe] 2.23 2.23.1 Th e UJ instruction format would be most appropriate because it would allow the maximum number of bits possible for the “ loop ” parameter, thereby maximizing the utility of the instruction. 2.23.2 It can be done in three instructions: loop: addi x29, x29, -1 // Subtract 1 from x29 bgt x29, x0, loop // Continue if x29 not negative addi x29, x29, 1 // Add back 1 that shouldn’t have been subtracted. 2.24 2.24.1 Th e fi nal value of xs is 20 . 2.24.2 acc = 0; i = 10; w hile (i ! = 0) { a cc += 2; i --; } 2 .24.3 4 *N + 1 instructions. 2.24.4 (Note: change condition ! = to > = in the while loop) a cc = 0; i = 10; w hile (i >= 0) { a cc += 2; i --; } Chapter 2 Solutions S-7 2 .25 Th e C code can be implemented in RISC-V assembly as follows. L OOPI: addi x7, x0, 0 // Init i = 0 bge x7, x5, ENDI // While i < a addi x30, x10, 0 // x30 = &D addi x29, x0, 0 // Init j = 0 LOOPJ: bge x29, x6, ENDJ // While j < b add x31, x7, x29 // x31 = i+j sd x31, 0(x30) // D[4*j] = x31 addi x30, x30, 32 // x30 = &D[4*(j+1)] addi x29, x29, 1 // j++ jal x0, LOOPJ ENDJ: addi x7, x7, 1 // i++; jal x0, LOOPI ENDI: 2 .26 Th e code requires 13 RISC-V instructions. When a = 10 and b = 1, this results in 123 instructions being executed. 2.27 / / This C code corresponds most directly to the given assembly. int i; for (i = 0; i < 100; i++) { result += *MemArray; MemArray++; } return result; // However, many people would write the code this way: int i; for (i = 0; i < 100; i++) { result += MemArray[i]; } return result; S-8 Chapter 2 Solutions 2.28 T he address of the last element of MemArray can be used to terminate the loop: add x29, x10, 800 // x29 = &MemArray[101] LOOP: ld x7, 0(x10) add x5, x5, x7 addi x10, x10, 8 blt x10, x29, LOOP // Loop until MemArray points to one-past the last element 2.29 // IMPORTANT! Stack pointer must reamin a multiple of 16!!!! fib: beq x10, x0, done // If n==0, return 0 addi x5, x0, 1 beq x10, x5, done // If n==1, return 1 addi x2, x2, -16 // Allocate 2 words of stack space sd x1, 0(x2) // Save the return address sd x10, 8(x2) // Save the current n addi x10, x10, -1 // x10 = n-1 jal x1, fib // fib(n-1) ld x5, 8(x2) // Load old n from the stack sd x10, 8(x2) // Push fib(n-1) onto the stack addi x10, x5, -2 // x10 = n-2 jal x1, fib // Call fib(n-2) ld x5, 8(x2) // x5 = fib(n-1) add x10, x10, x5 // x10 = fib(n-1)+fib(n-2) // Clean up: ld x1, 0(x2) // Load saved return address addi x2, x2, 16 // Pop two words from the stack done: jalr x0, x1 2 .30 [answers will vary] Chapter 2 Solutions S-9 2.31 / / IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11= c+d from the stack jal x1, g // Call x10 = g(g(a,b), c+d) ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jalr x0, x1 2 .32 We can use the tail-call optimization for the second call to g , saving one instruction: // IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11 = c+d from the stack ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jal x0, g // Call x10 = g(g(a,b), c+d) 2 .33 *We have no idea what the contents of x 10-x14 are, g can set them as it pleases. *We don’t know what the precise contents of x8 and sp are; but we do know that they are identical to the contents when f was called. *Similarly, we don’t know what the precise contents of x1 are; but, we do know that it is equal to the return address set by the “ jal x1, f ” instruction that invoked f . S-10 Chapter 2 Solutions 2.34 a_to_i: addi x28, x0, 10 # Just stores the constant 10 addi x29, x0, 0 # Stores the running total addi x5, x0, 1 # Tracks whether input is positive or negative # Test for initial ‘+’ or ‘-’ lbu x6, 0(x10) # Load the first character addi x7, x0, 45 # ASCII ‘-’ bne x6, x7, noneg addi x5, x0, -1 # Set that input was negative addi x10, x10, 1 # str++ jal x0, main_atoi_loop noneg: addi x7, x0, 43 # ASCII ‘+’ bne x6, x7, main_atoi_loop addi x10, x10, 1 # str++ main_atoi_loop: lbu x6, 0(x10) # Load the next digit beq x6, x0, done # Make sure next char is a digit, or fail addi x7, x0, 48 # ASCII ‘0’ sub x6, x6, x7 blt x6, x0, fail # *str < ‘0’ bge x6, x28, fail # *str >= ‘9’ # Next char is a digit, so accumulate it into x29 mul x29, x29, x28 # x29 *= 10 add x29, x29, x6 # x29 += *str - ‘0’ addi x10, x10, 1 # str++ jal x0, main_atoi_loop done: addi x10, x29, 0 # Use x29 as output value mul x10, x10, x5 # Multiply by sign jalr x0, x1 # Return result fail: addi x10, x0, -1 jalr x0, x1 2.35 2.35.1 0x11 2 .35.2 0 x88 Chapter 2 Solutions S-11 2 .36 l ui x 10, 0 x11223 addi x10, x10, 0x344 slli x10, x10, 32 lui x5, 0x55667 addi x5, x5, 0x788 add x10, x10, x5 2.37 setmax: try: lr.d x5, (x10) # Load-reserve *shvar bge x5, x11, release # Skip update if *shvar > x addi x5, x11, 0 release: sc.d x7, x5, (x10) bne x7, x0, try # If store-conditional failed, try again jalr x0, x1 2 .38 W hen two processors A and B begin executing this loop at the same time, at most one of them will execute the store-conditional instruction successfully, while the other will be forced to retry the loop. If processor A’s store-conditional successds initially, then B will re-enter the try block, and it will see the new value of shvar written by A when it fi nally succeeds. Th e hardware guarantees that both processors will eventually execute the code completely. 2.39 2.39.1 No. Th e resulting machine would be slower overall. Current CPU requires (num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 500*1 + 300*10 + 100*3 = 3800 cycles. Th e new CPU requires (.75*num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 375*1 + 300*10 + 100*3 = 3675 cycles. However, given that each of the new CPU’s cycles is 10% longer than the original CPU’s cycles, the new CPU’s 3675 cycles will take as long as 4042.5 cycles on the original CPU. 2.39.2 If we double the performance of arithmetic instructions by reducing their CPI to 0.5, then the the CPU will run the reference program in (500*.5) + (300*10) + 100*3 = 3550 cycles. Th is represents a speedup of 1.07. If we improve the performance of arithmetic instructions by a factor of 10 (reducing their CPI to 0.1), then the the CPU will run the reference program in (500*.1) + (300*10) + 100*3 = 3350 cycles. Th is represents a speedup of 1.13. S-12 Chapter 2 Solutions 2.40 2.40.1 Take the weighted average: 0.7*2 + 0.1*6 + 0.2*3 = 2.6 2.40.2 For a 25% improvement, we must reduce the CPU to 2.6*.75 = 1.95. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.95. Solving for x shows that the arithmetic instructions must have a CPI of at most 1.07. 2.40.3 For a 50% improvement, we must reduce the CPU to 2.6*.5 = 1.3. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.3. Solving for x shows that the arithmetic instructions must have a CPI of at most 0.14 2 .41 ldr x28, x5(x10), 3 // Load x28=A[f] addi x5, x5, 1 // f++ ldr x29, x5(x10), 3 // Load x29=A[f+1] add x29, x29, x28 // Add x29 = A[f] + A[f+1] sdr x12, x6(x11), 3 // Store B[g] = x29 2.42 l dr x 28, x28, (x10), 3 / / Load x28=A[i] ldr x29, x29, (x11), 3 // Load x29=B[j] add x29, x28, x29 sd x29, 64(x11) // Store B[8]=x29 (don’t need scaled store here) Solutions 3

Montrer plus Lire moins











Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Infos sur le Document

Publié le
15 novembre 2021
Nombre de pages
182
Écrit en
2021/2022
Type
Examen
Contient
Questions et réponses

Sujets

Aperçu du contenu

, Chapter 1 Solutions S-3



1.1 Personal computer (includes workstation and laptop): Personal computers
emphasize delivery of good performance to single users at low cost and usually
execute third-party software.
Personal mobile device (PMD, includes tablets): PMDs are battery operated
with wireless connectivity to the Internet and typically cost hundreds of
dollars, and, like PCs, users can download software (“apps”) to run on them.
Unlike PCs, they no longer have a keyboard and mouse, and are more likely
to rely on a touch-sensitive screen or even speech input.
Server: Computer used to run large problems and usually accessed via a
network.
Warehouse-scale computer: Thousands of processors forming a large cluster.
Supercomputer: Computer composed of hundreds to thousands of processors
and terabytes of memory.
Embedded computer: Computer designed to run one application or one set
of related applications and integrated into a single system.

1.2
a. Performance via Pipelining
b. Dependability via Redundancy
c. Performance via Prediction
d. Make the Common Case Fast
e. Hierarchy of Memories
f. Performance via Parallelism
g. Design for Moore’s Law
h. Use Abstraction to Simplify Design

1.3 The program is compiled into an assembly language program, which is then
assembled into a machine language program.

1.4
a. 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160 bytes/
frame.
b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds

1.5
a. performance of P1 (instructions/sec) = 3 × 109/1.5 = 2 × 109
performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109
performance of P3 (instructions/sec) = 4 × 109/2.2 = 1.8 × 109

,S-4 Chapter 1 Solutions



b. cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
c. No. instructions(P1) = 30 × 109/1.5 = 20 × 109
No. instructions(P2) = 25 × 109/1 = 25 × 109
No. instructions(P3) = 40 × 109/2.2 = 18.18 × 109
CPInew = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6
f = No. instr. × CPI/time, then
f(P1) = 20 × 109 × 1.8/7 = 5.14 GHz
f(P2) = 25 × 109 × 1.2/7 = 4.28 GHz
f(P1) = 18.18 × 109 × 2.6/7 = 6.75 GHz

1.6
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 × 105
instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109) =
10.4 × 10−4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109) =
6.66 × 10−4 s
CPI(P1) = 10.4 × 10−4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10−4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105

1.7
a. CPI = Texec × f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
b. fB/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. TA/Tnew = 1.67
TB/Tnew = 2.27

, Chapter 1 Solutions S-5



1.8
1.8.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.8.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.8.3 (Snew + Dnew)/(Sold + Dold) = 0.90
Dnew = C × Vnew 2 × F
Sold = Vold × I
Snew = Vnew × I
Therefore:
Vnew = [Dnew/(C × F)]1/2
Dnew = 0.90 × (Sold + Dold) − Snew
Snew = Vnew × (Sold/Vold)
Pentium 4:
Snew = Vnew × (10/1.25) = Vnew × 8
Dnew = 0.90 × 100 − Vnew × 8 = 90 − Vnew × 8
Vnew = [(90 − Vnew × 8)/(3.2E8 × 3.6E9)]1/2
Vnew = 0.85 V
Core i5:
Snew = Vnew × (30/0.9) = Vnew × 33.3
Dnew = 0.90 × 70 − Vnew × 33.3 = 63 − Vnew × 33.3
Vnew = [(63 − Vnew × 33.3)/(2.9E8 × 3.4E9)]1/2
Vnew = 0.64 V

1.9
1.9.1
$14.49
Accéder à l'intégralité du document:

Garantie de satisfaction à 100%
Disponible immédiatement après paiement
En ligne et en PDF
Tu n'es attaché à rien

Faites connaissance avec le vendeur

Seller avatar
Les scores de réputation sont basés sur le nombre de documents qu'un vendeur a vendus contre paiement ainsi que sur les avis qu'il a reçu pour ces documents. Il y a trois niveaux: Bronze, Argent et Or. Plus la réputation est bonne, plus vous pouvez faire confiance sur la qualité du travail des vendeurs.
Expert001 Chamberlain School Of Nursing
Voir profil
S'abonner Vous devez être connecté afin de suivre les étudiants ou les cours
Vendu
802
Membre depuis
4 année
Nombre de followers
566
Documents
1190
Dernière vente
3 jours de cela
Expert001

High quality, well written Test Banks, Guides, Solution Manuals and Exams to enhance your learning potential and take your grades to new heights. Kindly leave a review and suggestions. We do take pride in our high-quality services and we are always ready to support all clients.

4.2

159 revues

5
104
4
18
3
14
2
7
1
16

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

Student with book image

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions