Examen

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual)

Note

Vendu

Pages

182

Grade

A+

Publié le

15-11-2021

Écrit en

2021/2022

Exam (elaborations) TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy (Solution Manual) includes workstation and laptop): Personal computers emphasize delivery of good performance to single users at low cost and usually execute third-party soft ware. Personal mobile device (PMD, includes tablets): PMDs are battery operated with wireless connectivity to the Internet and typically cost hundreds of dollars, and, like PCs, users can download soft ware (“apps”) to run on them. Unlike PCs, they no longer have a keyboard and mouse, and are more likely to rely on a touch-sensitive screen or even speech input. Server: Computer used to run large problems and usually accessed via a network. Warehouse-scale computer: Th ousands of processors forming a large cluster. Supercomputer: Computer composed of hundreds to thousands of processors and terabytes of memory. Embedded computer: Computer designed to run one application or one set of related applications and integrated into a single system. 1 .2 a. Performance via Pipelining b. Dependability via Redundancy c. Performance via Prediction d. Make the Common Case Fast e. Hierarchy of Memories f. Performance via Parallelism g. Design for Moore’s Law h. Use Abstraction to Simplify Design 1 .3 Th e program is compiled into an assembly language program, which is then assembled into a machine language program. 1 .4 a. 1280 × 1024 pixels = 1,310,720 pixels = > 1,310,720 × 3 = 3,932,160 bytes/ frame. b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds 1 .5 a. performance of P1 (instructions/sec) = 3 × 10 9 /1.5 = 2 × 10 9 performance of P2 (instructions/sec) = 2.5 × 10 9 /1.0 = 2.5 × 10 9 performance of P3 (instructions/sec) = 4 × 10 9 /2.2 = 1.8 × 10 9 S-4 Chapter 1 Solutions b. cycles(P1) = 10 × 3 × 10 9 = 30 × 10 9 s cycles(P2) = 10 × 2.5 × 10 9 = 25 × 10 9 s cycles(P3) = 10 × 4 × 10 9 = 40 × 10 9 s c. No. instructions(P1) = 30 × 10 9 /1.5 = 20 × 10 9 No. instructions(P2) = 25 × 10 9 /1 = 25 × 10 9 No. instructions(P3) = 40 × 10 9 /2.2 = 18.18 × 10 9 CPI new = CPI old × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6 f = No. instr. × CPI/time, then f (P1) = 20 × 10 9 × 1.8/7 = 5.14 GHz f (P2) = 25 × 10 9 × 1.2/7 = 4.28 GHz f (P1) = 18.18 × 10 9 × 2.6/7 = 6.75 GHz 1 .6 a. Class A: 10 5 instr. Class B: 2 × 10 5 instr. Class C: 5 × 10 5 instr. Class D: 2 × 10 5 instr. Time = No. instr. × CPI/clock rate Total time P1 = (10 5 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3)/(2.5 × 10 9 ) = 10.4 × 10 −4 s Total time P2 = (10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2)/(3 × 10 9 ) = 6.66 × 10 −4 s CPI(P1) = 10.4 × 10 −4 × 2.5 × 10 9 /10 6 = 2.6 CPI(P2) = 6.66 × 10 −4 × 3 × 10 9 /10 6 = 2.0 b. clock cycles(P1) = 10 5 × 1 + 2 × 10 5 × 2 + 5 × 10 5 × 3 + 2 × 10 5 × 3 = 26 × 10 5 clock cycles(P2) = 10 5 × 2 + 2 × 10 5 × 2 + 5 × 10 5 × 2 + 2 × 10 5 × 2 = 20 × 10 5 1 .7 a. CPI = T exec × f/No. instr. Compiler A CPI = 1.1 Compiler B CPI = 1.25 b. f B /f A = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37 c. T A /T new = 1.67 T B /T new = 2.27 Chapter 1 Solutions S-5 1 .8 1.8.1 C = 2 × DP/(V 2 × F) Pentium 4: C = 3.2E–8F Core i5 Ivy Bridge: C = 2.9E–8F 1.8.2 Pentium 4: 10/100 = 10% Core i5 Ivy Bridge: 30/70 = 42.9% 1.8.3 (S new + D new )/(S old + D old ) = 0.90 D new = C × V new 2 × F S old = V old × I S new = V new × I Therefore: V new = [D new /(C × F)] 1/2 D new = 0.90 × (S old + D old ) − S new S new = V new × (S old /V old ) Pentium 4: S new = V new × (10/1.25) = V new × 8 D new = 0.90 × 100 − V new × 8 = 90 − V new × 8 V new = [(90 − V new × 8)/(3.2E8 × 3.6E9)] 1/2 V new = 0.85 V Core i5: S new = V new × (30/0.9) = V new × 33.3 D new = 0.90 × 70 − V new × 33.3 = 63 − V new × 33.3 V new = [(63 − V new × 33.3)/(2.9E8 × 3.4E9)] 1/2 V new = 0.64 V 1 .9 1.9.1 S-6 Chapter 1 Solutions 1.9.2 1.9.3 3 1.10 1.10.1 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /84 = 2.10 cm 2 yield 15cm = 1/(1 + (0.020 × 2.10/2)) 2 = 0.9593 die area 20cm = wafer area/dies per wafer = π × 10 2 /100 = 3.14 cm 2 yield 20cm = 1/(1 + (0.031 × 3.14/2)) 2 = 0.9093 1.10.2 cost/die 15cm = 12/(84 × 0.9593) = 0.1489 cost/die 20cm = 15/(100 × 0.9093) = 0.1650 1.10.3 die area 15cm = wafer area/dies per wafer = π × 7.5 2 /(84 × 1.1) = 1.91 cm 2 yield 15cm = 1/(1 + (0.020 × 1.15 × 1.91/2)) 2 = 0.9575 die area 20cm = wafer area/dies per wafer = π × 10 2 /(100 × 1.1) = 2.86 cm 2 yield 20cm = 1/(1 + (0.03 × 1.15 × 2.86/2)) 2 = 0.9082 1.10.4 defects per area 0.92 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.92 .5 )/ (0.92 .5 × 2/2) = 0.043 defects/cm 2 defects per area 0.95 = (1–y .5 )/(y .5 × die_area/2) = (1 − 0.95 .5 )/ (0.95 .5 × 2/2) = 0.026 defects/cm 2 1 .11 1 .11.1 CPI = clock rate × C PU time/instr. count clock rate = 1/cycle time = 3 GHz CPI(bzip2) = 3 × 10 9 × 750/(2389 × 10 9 ) = 0.94 1 .11.2 SPEC ratio = ref. time/execution time SPEC ratio(bzip2) = 9 650/750 = 12.86 1 .11.3 CPU time = N o. instr. × C PI/clock rate If CPI and clock rate do not change, the CPU time increase is equal to the increase in the number of instructions, that is 10%. Chapter 1 Solutions S-7 1 .11.4 CPU time(before) = No. instr. × CPI/clock rate CPU time(aft er) = 1.1 × No. instr. × 1.05 × CPI/clock rate CPU time(aft er)/CPU time(before) = 1.1 × 1.05 = 1.155. Th us, CPU time is increased by 15.5%. 1 .11.5 SPECratio = reference time/CPU time SPECratio(aft er)/SPECratio(before) = CPU time(before)/CPU time(aft er) = 1/1.1555 = 0.86. Th e SPECratio is decreased by 14%. 1 .11.6 CPI = ( CPU time × c lock rate)/No. instr. CPI = 700 × 4 × 10 9 /(0.85 × 2389 × 10 9 ) = 1.37 1 .11.7 Clock rate ratio = 4 GHz/3 GHz = 1 .33 CPI @ 4 GHz = 1.37, CPI @ 3 GHz = 0.94, ratio = 1.45 Th ey are diff erent because, although the number of instructions has been reduced by 15%, the CPU time has been reduced by a lower percentage. 1 .11.8 700/750 = 0.933. CPU time reduction: 6.7% 1 .11.9 No. instr. = C PU time × clock rate/CPI No. instr. = 960 × 0.9 × 4 × 10 9 /1.61 = 2146 × 10 9 1 .11.10 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × CPI/0.9 × CPU time = 1/0.9 clock rate old = 3.33 GHz 1 .11.11 Clock rate = No. instr. × CPI/CPU time. Clock rate new = No. instr. × 0.85 × CPI/0.80 CPU time = 0.85/0.80, clock rate old = 3.18 GHz 1 .12 1.12.1 T(P1) = 5 × 10 9 × 0.9/(4 × 10 9 ) = 1.125 s T(P2) = 10 9 × 0.75/(3 × 10 9 ) = 0.25 s clock rate(P1) > clock rate(P2), performance(P1) < performance(P2) 1.12.2 T(P1) = No. instr. × CPI/clock rate T(P1) = 2.25 3 1021 s T(P2) 5 N × 0.75/(3 × 10 9 ), then N = 9 × 10 8 1.12.3 MIPS = Clock rate × 10 −6 /CPI MIPS(P1) = 4 × 10 9 × 10 −6 /0.9 = 4.44 × 10 3 S-8 Chapter 1 Solutions MIPS(P2) = 3 × 10 9 × 10 −6 /0.75 = 4.0 × 10 3 MIPS(P1) > MIPS(P2), performance(P1) < performance(P2) (from 11a) 1.12.4 MFLOPS = No. FP operations × 10 −6 /T MFLOPS(P1) = .4 × 5E9 × 1E-6/1.125 = 1.78E3 MFLOPS(P2) = .4 × 1E9 × 1E-6/.25 = 1.60E3 MFLOPS(P1) > MFLOPS(P2), performance(P1) < performance(P2) (from 11a) 1 .13 1.13.1 T fp = 70 × 0.8 = 56 s. T new = 56 + 85 + 55 + 40 = 236 s. Reduction: 5.6% 1.13.2 T new = 250 × 0.8 = 200 s, T fp + T l/s + T branch = 165 s, T int = 35 s. Reduction time INT: 58.8% 1.13.3 T new = 250 × 0.8 = 200 s, T fp + T int + T l/s = 210 s. NO 1 .14 1.14.1 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 clock cycles = 512 × 10 6 ; T CPU = 0.256 s To have the number of clock cycles by improving the CPI of FP instructions: CPI improved fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved fp = (clock cycles/2 − (CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr.)) / No. FP instr. CPI improved fp = (256 − 4 62)/50 < 0 = = > not possible 1.14.2 Using the clock cycle data from a. To have the number of clock cycles improving the CPI of L/S instructions: CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI improved l/s × No. L/S instr. + CPI branch × No. branch instr. = clock cycles/2 CPI improved l/s = (clock cycles/2 − (CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI branch × No. branch instr.)) / No. L/S instr. CPI improved l/s = (256 − 198)/80 = 0.725 1.14.3 Clock cycles = CPI fp × No. FP instr. + CPI int × No. INT instr. + CPI l/s × No. L/S instr. + CPI branch × No. branch instr. Chapter 1 Solutions S-9 T CPU = clock cycles/clock rate = clock cycles/2 × 10 9 CPI int = 0.6 × 1 = 0.6; CPI fp = 0.6 × 1 = 0.6; CPI l/s = 0.7 × 4 = 2.8; CPI branch = 0.7 × 2 = 1.4 T CPU (before improv.) = 0.256 s; T CPU (aft er improv.) = 0.171 s 1 .15 Solutions 2 Chapter 2 Solutions S-3 2.1 a ddi x5, x7,-5 a dd x 5, x5, x6 [addi f,h,-5 (note, no subi) add f,f,g] 2 .2 f = g+h+i 2.3 s ub x 30, x28, x29 // compute i-j slli x30, x30, 3 // multiply by 8 to convert the word offset to a byte offset ld x30, 0(x3) // load A[i-j] sd x30, 64(x11) // store in B[8] 2.4 B [g]= A[f] + A[f+1] slli x30, x5, 3 // x30 = f*8 add x30, x10, x30 // x30 = &A[f] slli x31, x6, 3 // x31 = g*8 add x31, x11, x31 // x31 = &B[g] ld x5, 0(x30) // f = A[f] addi x12, x30, 8 // x12 = &A[f]+8 (i.e. &A[f+1]) ld x30, 0(x12) // x30 = A[f+1] add x30, x30, x5 // x30 = A[f+1] + A[f] sd x30, 0(x31) // B[g] = x30 (i.e. A[f+1] + A[f]) 2.5 2.6 8 2.7 slli x28, x28, 3 // x28 = i*8 ld x28, 0(x10) // x28 = A[i] slli x29, x29, 3 // x29 = j*8 ld x29, 0(x11) // x29 = B[j] add x29, x28, x29 // Compute x29 = A[i] + B[j] sd x29, 64(x11) // Store result in B[8] S-4 Chapter 2 Solutions 2.8 f = 2*(&A) addi x30, x10, 8 // x30 = &A[1] addi x31, x10, 0 // x31 = &A sd x31, 0(x30) // A[1] = &A ld x30, 0(x30) // x30 = A[1] = &A add x5, x30, x31 // f = &A + &A = 2*(&A) 2.9 addi x30,x10,8 addi x31,x10,0 sd x31,0(x30) ld x30,0(x30) add x5, x30, x31 I-type 0x13, 0x0, -- 0x13, 0x0, -- 0x23, 0x3, -- 0x3, 0x3, -- 0x33, 0x0, 0x0 R-type R-type S-type I-type 10 -- 30 30 5 31 0 8 0 0 -- -- -- -- 30 31 10 31 30 30 opcode, type funct3,7 rs1 rs2 rd imm 2.10 2.10.1 0x 2.10.2 overflow 2.10.3 0xB 2.10.4 no overflow 2.10.5 0xD 2.10.6 overflow 2.11 2.11.1 Th ere is an overfl ow if 128 + x6 > 2 63 − 1. In other words, if x6 > 2 63 − 129. Th ere is also an overfl ow if 128 + x6 < −2 63 . In other words, if x6 < −2 63 − 128 (which is impossible given the range of x6 ). 2.11.2 Th ere is an overfl ow if 128 – x6 > 2 63 − 1. In other words, if x6 < −2 63 + 129. Th ere is also an overfl ow if 128 – x6 < −2 63 . In other words, if x6 > 2 63 + 128 (which is impossible given the range of x6 ). 2.11.3 Th ere is an overfl ow if x6 − 128 > 2 63 − 1. In other words, if x6 < 2 63 + 127 (which is impossible given the range of x6 ). Th ere is also an overfl ow if x6 − 128 < −2 63 . In other words, if x6 < −2 63 + 128. 2 .12 R -type: add x1, x1, x1 Chapter 2 Solutions S-5 2.13 S -type: 0x25F3023 ( 0011) 2.14 R -type: sub x6, x7, x5 (0x: ) 2 .15 I -type: ld x3, 4(x27) (0x4DB183: ) 2.16 2.16.1 Th e opcode would expand from 7 bits to 9. Th e rs1 , rs2 , and rd fi elds would increase from 5 bits to 7 bits. 2.16.2 Th e opcode would expand from 7 bits to 12. Th e rs1 and rd fi elds would increase from 5 bits to 7 bits. Th is change does not aff ect the imm fi eld per se , but it might force the ISA designer to consider shortening the immediate fi eld to avoid an increase in overall instruction size. 2.16.3 * Increasing the size of each bit fi eld potentially makes each instruction longer, potentially increasing the code size overall. * However, increasing the number of registers could lead to less register spillage, which would reduce the total number of instructions, possibly reducing the code size overall. 2.17 2.17.1 0xababefef8 2.17.2 0x 2.17.3 0x545 2 .18 It can be done in eight RISC-V instructions: addi x7, x0, 0x3f // Create bit mask for bits 16 to 11 slli x7, x7, 11 // Shift the masked bits and x28, x5, x7 // Apply the mask to x5 slli x7, x6, 15 // Shift the mask to cover bits 31 to 26 xori x7, x7, -1 // This is a NOT operation and x6, x6, x7 // “Zero out” positions 31 to 26 of x6 slli x28, x28, 15 // Move selection from x5 into positions 31 to 26 or x6, x6, x28 // Load bits 31 to 26 from x28 2 .19 x ori x5, x6, -1 S-6 Chapter 2 Solutions 2.20 l d x 6, 0(x17) slli x6, x6, 4 2 .21 x 6 = 2 2.22 2.22.1 [0x1ff00000, 0x200FFFFE] 2.22.2 [0x1FFFF000, 0x20000ffe] 2.23 2.23.1 Th e UJ instruction format would be most appropriate because it would allow the maximum number of bits possible for the “ loop ” parameter, thereby maximizing the utility of the instruction. 2.23.2 It can be done in three instructions: loop: addi x29, x29, -1 // Subtract 1 from x29 bgt x29, x0, loop // Continue if x29 not negative addi x29, x29, 1 // Add back 1 that shouldn’t have been subtracted. 2.24 2.24.1 Th e fi nal value of xs is 20 . 2.24.2 acc = 0; i = 10; w hile (i ! = 0) { a cc += 2; i --; } 2 .24.3 4 *N + 1 instructions. 2.24.4 (Note: change condition ! = to > = in the while loop) a cc = 0; i = 10; w hile (i >= 0) { a cc += 2; i --; } Chapter 2 Solutions S-7 2 .25 Th e C code can be implemented in RISC-V assembly as follows. L OOPI: addi x7, x0, 0 // Init i = 0 bge x7, x5, ENDI // While i < a addi x30, x10, 0 // x30 = &D addi x29, x0, 0 // Init j = 0 LOOPJ: bge x29, x6, ENDJ // While j < b add x31, x7, x29 // x31 = i+j sd x31, 0(x30) // D[4*j] = x31 addi x30, x30, 32 // x30 = &D[4*(j+1)] addi x29, x29, 1 // j++ jal x0, LOOPJ ENDJ: addi x7, x7, 1 // i++; jal x0, LOOPI ENDI: 2 .26 Th e code requires 13 RISC-V instructions. When a = 10 and b = 1, this results in 123 instructions being executed. 2.27 / / This C code corresponds most directly to the given assembly. int i; for (i = 0; i < 100; i++) { result += *MemArray; MemArray++; } return result; // However, many people would write the code this way: int i; for (i = 0; i < 100; i++) { result += MemArray[i]; } return result; S-8 Chapter 2 Solutions 2.28 T he address of the last element of MemArray can be used to terminate the loop: add x29, x10, 800 // x29 = &MemArray[101] LOOP: ld x7, 0(x10) add x5, x5, x7 addi x10, x10, 8 blt x10, x29, LOOP // Loop until MemArray points to one-past the last element 2.29 // IMPORTANT! Stack pointer must reamin a multiple of 16!!!! fib: beq x10, x0, done // If n==0, return 0 addi x5, x0, 1 beq x10, x5, done // If n==1, return 1 addi x2, x2, -16 // Allocate 2 words of stack space sd x1, 0(x2) // Save the return address sd x10, 8(x2) // Save the current n addi x10, x10, -1 // x10 = n-1 jal x1, fib // fib(n-1) ld x5, 8(x2) // Load old n from the stack sd x10, 8(x2) // Push fib(n-1) onto the stack addi x10, x5, -2 // x10 = n-2 jal x1, fib // Call fib(n-2) ld x5, 8(x2) // x5 = fib(n-1) add x10, x10, x5 // x10 = fib(n-1)+fib(n-2) // Clean up: ld x1, 0(x2) // Load saved return address addi x2, x2, 16 // Pop two words from the stack done: jalr x0, x1 2 .30 [answers will vary] Chapter 2 Solutions S-9 2.31 / / IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11= c+d from the stack jal x1, g // Call x10 = g(g(a,b), c+d) ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jalr x0, x1 2 .32 We can use the tail-call optimization for the second call to g , saving one instruction: // IMPORTANT! Stack pointer must remain a multiple of 16!!! f: addi x2, x2, -16 // Allocate stack space for 2 words sd x1, 0(x2) // Save return address add x5, x12, x13 // x5 = c+d sd x5, 8(x2) // Save c+d on the stack jal x1, g // Call x10 = g(a,b) ld x11, 8(x2) // Reload x11 = c+d from the stack ld x1, 0(x2) // Restore return address addi x2, x2, 16 // Restore stack pointer jal x0, g // Call x10 = g(g(a,b), c+d) 2 .33 *We have no idea what the contents of x 10-x14 are, g can set them as it pleases. *We don’t know what the precise contents of x8 and sp are; but we do know that they are identical to the contents when f was called. *Similarly, we don’t know what the precise contents of x1 are; but, we do know that it is equal to the return address set by the “ jal x1, f ” instruction that invoked f . S-10 Chapter 2 Solutions 2.34 a_to_i: addi x28, x0, 10 # Just stores the constant 10 addi x29, x0, 0 # Stores the running total addi x5, x0, 1 # Tracks whether input is positive or negative # Test for initial ‘+’ or ‘-’ lbu x6, 0(x10) # Load the first character addi x7, x0, 45 # ASCII ‘-’ bne x6, x7, noneg addi x5, x0, -1 # Set that input was negative addi x10, x10, 1 # str++ jal x0, main_atoi_loop noneg: addi x7, x0, 43 # ASCII ‘+’ bne x6, x7, main_atoi_loop addi x10, x10, 1 # str++ main_atoi_loop: lbu x6, 0(x10) # Load the next digit beq x6, x0, done # Make sure next char is a digit, or fail addi x7, x0, 48 # ASCII ‘0’ sub x6, x6, x7 blt x6, x0, fail # *str < ‘0’ bge x6, x28, fail # *str >= ‘9’ # Next char is a digit, so accumulate it into x29 mul x29, x29, x28 # x29 *= 10 add x29, x29, x6 # x29 += *str - ‘0’ addi x10, x10, 1 # str++ jal x0, main_atoi_loop done: addi x10, x29, 0 # Use x29 as output value mul x10, x10, x5 # Multiply by sign jalr x0, x1 # Return result fail: addi x10, x0, -1 jalr x0, x1 2.35 2.35.1 0x11 2 .35.2 0 x88 Chapter 2 Solutions S-11 2 .36 l ui x 10, 0 x11223 addi x10, x10, 0x344 slli x10, x10, 32 lui x5, 0x55667 addi x5, x5, 0x788 add x10, x10, x5 2.37 setmax: try: lr.d x5, (x10) # Load-reserve *shvar bge x5, x11, release # Skip update if *shvar > x addi x5, x11, 0 release: sc.d x7, x5, (x10) bne x7, x0, try # If store-conditional failed, try again jalr x0, x1 2 .38 W hen two processors A and B begin executing this loop at the same time, at most one of them will execute the store-conditional instruction successfully, while the other will be forced to retry the loop. If processor A’s store-conditional successds initially, then B will re-enter the try block, and it will see the new value of shvar written by A when it fi nally succeeds. Th e hardware guarantees that both processors will eventually execute the code completely. 2.39 2.39.1 No. Th e resulting machine would be slower overall. Current CPU requires (num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 500*1 + 300*10 + 100*3 = 3800 cycles. Th e new CPU requires (.75*num arithmetic * 1 cycle) + (num load/store * 10 cycles) + (num branch/jump * 3 cycles) = 375*1 + 300*10 + 100*3 = 3675 cycles. However, given that each of the new CPU’s cycles is 10% longer than the original CPU’s cycles, the new CPU’s 3675 cycles will take as long as 4042.5 cycles on the original CPU. 2.39.2 If we double the performance of arithmetic instructions by reducing their CPI to 0.5, then the the CPU will run the reference program in (500*.5) + (300*10) + 100*3 = 3550 cycles. Th is represents a speedup of 1.07. If we improve the performance of arithmetic instructions by a factor of 10 (reducing their CPI to 0.1), then the the CPU will run the reference program in (500*.1) + (300*10) + 100*3 = 3350 cycles. Th is represents a speedup of 1.13. S-12 Chapter 2 Solutions 2.40 2.40.1 Take the weighted average: 0.7*2 + 0.1*6 + 0.2*3 = 2.6 2.40.2 For a 25% improvement, we must reduce the CPU to 2.6*.75 = 1.95. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.95. Solving for x shows that the arithmetic instructions must have a CPI of at most 1.07. 2.40.3 For a 50% improvement, we must reduce the CPU to 2.6*.5 = 1.3. Th us, we want 0.7*x + 0.1*6 + 0.2*3 < = 1.3. Solving for x shows that the arithmetic instructions must have a CPI of at most 0.14 2 .41 ldr x28, x5(x10), 3 // Load x28=A[f] addi x5, x5, 1 // f++ ldr x29, x5(x10), 3 // Load x29=A[f+1] add x29, x29, x28 // Add x29 = A[f] + A[f+1] sdr x12, x6(x11), 3 // Store B[g] = x29 2.42 l dr x 28, x28, (x10), 3 / / Load x28=A[i] ldr x29, x29, (x11), 3 // Load x29=B[j] add x29, x28, x29 sd x29, 64(x11) // Store B[8]=x29 (don’t need scaled store here) Solutions 3

Montrer plus Lire moins

Établissement

Cours

Oups ! Impossible de charger votre document. Réessayez ou contactez le support.

Signaler une violation de copyright

Livre connecté

David A. Patterson, John L. Hennessy Computer Organization and Design

Édition:2012
ISBN:9780123747501
Édition:Inconnu

École, étude et sujet

Établissement: Harvard University
Cours: TEST BANK FOR Computer Organization and Design - The Hardware Software Interface 2nd Edition By David A. Patterson, John L. Hennessy

Tous les documents sur ce sujet (1)

Infos sur le Document

Publié le: 15 novembre 2021
Nombre de pages: 182
Écrit en: 2021/2022
Type: Examen
Contient: Questions et réponses

Sujets

computer organization and design
hardware software interface

Aperçu du contenu

, Chapter 1 Solutions S-3

1.1 Personal computer (includes workstation and laptop): Personal computers
emphasize delivery of good performance to single users at low cost and usually
execute third-party software.
Personal mobile device (PMD, includes tablets): PMDs are battery operated
with wireless connectivity to the Internet and typically cost hundreds of
dollars, and, like PCs, users can download software (“apps”) to run on them.
Unlike PCs, they no longer have a keyboard and mouse, and are more likely
to rely on a touch-sensitive screen or even speech input.
Server: Computer used to run large problems and usually accessed via a
network.
Warehouse-scale computer: Thousands of processors forming a large cluster.
Supercomputer: Computer composed of hundreds to thousands of processors
and terabytes of memory.
Embedded computer: Computer designed to run one application or one set
of related applications and integrated into a single system.

1.2
a. Performance via Pipelining
b. Dependability via Redundancy
c. Performance via Prediction
d. Make the Common Case Fast
e. Hierarchy of Memories
f. Performance via Parallelism
g. Design for Moore’s Law
h. Use Abstraction to Simplify Design

1.3 The program is compiled into an assembly language program, which is then
assembled into a machine language program.

1.4
a. 1280 × 1024 pixels = 1,310,720 pixels => 1,310,720 × 3 = 3,932,160 bytes/
frame.
b. 3,932,160 bytes × (8 bits/byte) /100E6 bits/second = 0.31 seconds

1.5
a. performance of P1 (instructions/sec) = 3 × 109/1.5 = 2 × 109
performance of P2 (instructions/sec) = 2.5 × 109/1.0 = 2.5 × 109
performance of P3 (instructions/sec) = 4 × 109/2.2 = 1.8 × 109

,S-4 Chapter 1 Solutions

b. cycles(P1) = 10 × 3 × 109 = 30 × 109 s
cycles(P2) = 10 × 2.5 × 109 = 25 × 109 s
cycles(P3) = 10 × 4 × 109 = 40 × 109 s
c. No. instructions(P1) = 30 × 109/1.5 = 20 × 109
No. instructions(P2) = 25 × 109/1 = 25 × 109
No. instructions(P3) = 40 × 109/2.2 = 18.18 × 109
CPInew = CPIold × 1.2, then CPI(P1) = 1.8, CPI(P2) = 1.2, CPI(P3) = 2.6
f = No. instr. × CPI/time, then
f(P1) = 20 × 109 × 1.8/7 = 5.14 GHz
f(P2) = 25 × 109 × 1.2/7 = 4.28 GHz
f(P1) = 18.18 × 109 × 2.6/7 = 6.75 GHz

1.6
a. Class A: 105 instr. Class B: 2 × 105 instr. Class C: 5 × 105 instr. Class D: 2 × 105
instr.
Time = No. instr. × CPI/clock rate
Total time P1 = (105 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3)/(2.5 × 109) =
10.4 × 10−4 s
Total time P2 = (105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2)/(3 × 109) =
6.66 × 10−4 s
CPI(P1) = 10.4 × 10−4 × 2.5 × 109/106 = 2.6
CPI(P2) = 6.66 × 10−4 × 3 × 109/106 = 2.0
b. clock cycles(P1) = 105 × 1 + 2 × 105 × 2 + 5 × 105 × 3 + 2 × 105 × 3 = 26 × 105
clock cycles(P2) = 105 × 2 + 2 × 105 × 2 + 5 × 105 × 2 + 2 × 105 × 2 = 20 × 105

1.7
a. CPI = Texec × f/No. instr.
Compiler A CPI = 1.1
Compiler B CPI = 1.25
b. fB/fA = (No. instr.(B) × CPI(B))/(No. instr.(A) × CPI(A)) = 1.37
c. TA/Tnew = 1.67
TB/Tnew = 2.27

, Chapter 1 Solutions S-5

1.8
1.8.1 C = 2 × DP/(V2 × F)
Pentium 4: C = 3.2E–8F
Core i5 Ivy Bridge: C = 2.9E–8F
1.8.2 Pentium 4: 10/100 = 10%
Core i5 Ivy Bridge: 30/70 = 42.9%
1.8.3 (Snew + Dnew)/(Sold + Dold) = 0.90
Dnew = C × Vnew 2 × F
Sold = Vold × I
Snew = Vnew × I
Therefore:
Vnew = [Dnew/(C × F)]1/2
Dnew = 0.90 × (Sold + Dold) − Snew
Snew = Vnew × (Sold/Vold)
Pentium 4:
Snew = Vnew × (10/1.25) = Vnew × 8
Dnew = 0.90 × 100 − Vnew × 8 = 90 − Vnew × 8
Vnew = [(90 − Vnew × 8)/(3.2E8 × 3.6E9)]1/2
Vnew = 0.85 V
Core i5:
Snew = Vnew × (30/0.9) = Vnew × 33.3
Dnew = 0.90 × 70 − Vnew × 33.3 = 63 − Vnew × 33.3
Vnew = [(63 − Vnew × 33.3)/(2.9E8 × 3.4E9)]1/2
Vnew = 0.64 V

1.9
1.9.1

$14.49

Accéder à l'intégralité du document:

Garantie de satisfaction à 100%

Disponible immédiatement après paiement

En ligne et en PDF

Tu n'es attaché à rien

Faites connaissance avec le vendeur

Expert001

4.2

(159)

Faites connaissance avec le vendeur

Expert001 Chamberlain School Of Nursing

Voir profil

Vendu

802

Membre depuis

4 année

Nombre de followers

566

Documents

1190

Dernière vente

3 jours de cela

Expert001

High quality, well written Test Banks, Guides, Solution Manuals and Exams to enhance your learning potential and take your grades to new heights. Kindly leave a review and suggestions. We do take pride in our high-quality services and we are always ready to support all clients.

4.2

159 revues

104

Récemment consulté par vous

Pourquoi les étudiants choisissent Stuvia

Créé par d'autres étudiants, vérifié par les avis

Une qualité sur laquelle compter : rédigé par des étudiants qui ont réussi et évalué par d'autres qui ont utilisé ce document.

Le document ne convient pas ? Choisis un autre document

Aucun souci ! Tu peux sélectionner directement un autre document qui correspond mieux à ce que tu cherches.

Paye comme tu veux, apprends aussitôt

Aucun abonnement, aucun engagement. Paye selon tes habitudes par carte de crédit et télécharge ton document PDF instantanément.

“Acheté, téléchargé et réussi. C'est aussi simple que ça.”

Alisha Student

Foire aux questions

Qu'est-ce que j'obtiens en achetant ce document ?

Vous obtenez un PDF, disponible immédiatement après votre achat. Le document acheté est accessible à tout moment, n'importe où et indéfiniment via votre profil.

Garantie de remboursement : comment ça marche ?

Notre garantie de satisfaction garantit que vous trouverez toujours un document d'étude qui vous convient. Vous remplissez un formulaire et notre équipe du service client s'occupe du reste.

Auprès de qui est-ce que j'achète ce résumé ?

Stuvia est une place de marché. Alors, vous n'achetez donc pas ce document chez nous, mais auprès du vendeur Expert001. Stuvia facilite les paiements au vendeur.

Est-ce que j'aurai un abonnement?

Non, vous n'achetez ce résumé que pour $14.49. Vous n'êtes lié à rien après votre achat.

Peut-on faire confiance à Stuvia ?

4.6 étoiles sur Google & Trustpilot (+1000 avis) 52514 résumés ont été vendus ces 30 derniers jours Fondée en 2010, la référence pour acheter des résumés depuis déjà 16 ans