Exam (elaborations)

Microprocessor Architecture (1st Edition, 2009) – Answers to Selected Exercises – Baer

Rating

Sold

Pages

114

Grade

A+

Uploaded on

05-11-2025

Written in

2025/2026

INSTANT PDF DOWNLOAD — Answers to Selected Exercises for Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors (1st Edition, 2009) by Jean-Loup Baer. Includes all 8 chapters and figures covered, with detailed solutions on instruction-level parallelism, pipeline design, cache memory, and multicore processor performance — perfect for computer engineering and architecture students. microprocessor architecture solutions, Jean-Loup Baer textbook, pipeline processor exercises, chip multiprocessors answers, computer architecture manual, CPU design and analysis, instruction-level parallelism problems, cache memory examples, computer systems engineering, microprocessor design workbook, multicore processor theory, processor pipeline design, advanced computer architecture guide, memory hierarchy solutions, computer hardware engineering, parallel computing architecture, processor optimization techniques, digital design textbook, performance evaluation exercises, architecture solutions pdf

Show more Read less

Institution

Solution Manual

Course

Solution manual

Whoops! We can’t load your doc right now. Try again or contact support.

Report Copyright Violation

Written for

Institution: Solution manual
Course: Solution manual

Document information

Uploaded on: November 5, 2025
Number of pages: 114
Written in: 2025/2026
Type: Exam (elaborations)
Contains: Questions & answers

Subjects

digital design textbook
microprocessor architecture solutions
chip multiprocessors answers
instruction level parallelism problems
advanced computer architecture guide

Content preview

ALL 8 CHAPTERS & FIGURES COVERED

, Answers to Selected Exercises
Chapter I

1. Use, for example, the Wikipedia site. Yes, the Chart will follow Moore’s law even more closely
than Figure I.3 since the number of transistors for the Pentium Series is smaller than for the
Itanium.
2. Most influenced by:
Technology: cycle time
Compiler: number of instructions
Computer architect: CPI
3. CPI = 1.03 + 0.12c where c is the cost of a branch misprediction. When:
c = 3 then CPI = 1.39
c = 10 then CPI = 2.23
c = 20 then CPI = 3.43
Clearly a sophisticated branch predictor is necessary for deep pipelines.
4. CPI = 1.39 + 0.05 l where l is the cost of a cache miss. When
l = 20 then CPI =2.39
l = 100 then CPI =6.39
l = 500 then CPI =26.39
Of course a cache hierarchy where the latency to the second level cache will be of the order of 10
cycles or less is mandatory.
5. CPI = 1.26 and IPC = 0.79
6. MIPS = IPC/(c.106). If c is halved, MIPS is doubled without change to IPC. However, with
smaller c there is a good chance that the pipelines will be deeper and the relative penalty for a
cache miss will be greater thus requiring adjustments to keep the same IPC.
7. (a) 2 billion (2.109) (b) 0.13µsec (insignificant)
8. (a) 1.6 GHz (b) M1 (c) 1 P1 and 4 P2.
9. (a) M-7 (b) M-5 (c) M-7
10. Both computers have same cycle time and same CPI hence same MIPS. However one executes
fewer instructions than the other.
11. For s = 0.01, speedup = 7.47
For s = 0.1, speedup = 4.71
For s = 0.25, speedup = 2.91

,12. (a) 6.67 (b) 23 (c) 278
13. (a) and (b) weighted arithmetic mean (c) arithmetic mean (d) harmonic mean
14. (a), (b), (d) Direct applications of formulas.
(c) WAM weighed with cycles; WHM weighed with instruction counts
(e) WAM weighed with B’s; WHM weighed with A’s

, Chapter II

1. The basic idea is to stall the pipeline once a “branch taken” has been recognized. The only ALU
is in Stage EX. A possible solution is as follows. A new control line “branch in progress” is set at
decode time when a branch is recognized and will be passed to ID/EX and EX/MEM. The
address computation does not take place in stage ID but the signed displacement is transferred
into ID/EX and EX/MEM. If the after stage EX the “branch in progress” is set in EX/MEM and
“branch taken” is set as the output of the ALU, the result of that AND operation is sent to the
control unit. If it is “1”, fetching of a new instruction is disallowed and the instructions in
progress in IF/ID and ID/EX are no-oped. The displacement and the PC found in EX/MEM are
“forwarded” back to the inputs of the ALU giving, in the next cycle, the new PC. Then we
proceed as in the original pipeline for a branch taken. Notice that we can still apply a “branch not
taken” prediction strategy.
2. (a) 3 cycles (b) 0 cycles if the branch is not taken and 3 cycles if it is taken.
3. (a) Forwarding to inputs of the ALU comparator in stage ID must be set from both the ID/EX and
EX/MEM pipeline registers. Stalling must occur if one of the operand registers used for the
comparisons was generated by a load in one of the two previous instructions (in fact a stall of 2
cycles if the previous instruction was a load). (b) 1 cycle in the case of “branch taken”, 0 if the
branch is not taken assuming a “branch not taken” strategy.
4. One possible solution is to “no-op” the instruction generating an exception and the instructions
following it in the pipeline. The PC of the saved state should point to the instruction with the
exception.
5. We need a new ALU on the MEM stage. At decode time, we need a control line of 2 bits
indicating whether this extra ALU should be activated and setting also the control of a
multiplexer for one input of the ALU yielding the values of c (we assume only 3 possible values
of c; if more make the control line 3 bits wide). Stalling and forwarding must be set for Rj as it is
done for a RAW dependency after a load. Finally, the data path to the register file must be
doubled in width and two registers must be able to be stored in the same cycle.
6. 1 right and 23 left.
7. 0 right and 24 left
8. (a) always true (b) not necessarily (c) true in general

,9. (a) 256 (b) Lines 0 to 63 will contain A[0] to A[127]; lines 64 to 255 will contain B[128] to
B[1023] (c) 0 read misses and 512 write misses; no difference for 2-way set-associative (d) 768
bytes were written back
10. Same answers as in Ex. 9 for (a) (b) and (c). (d) 2048 bytes
11. (a) The CPIs are respectively: 1.245, 1.238, and 1.227. Therefore choose cache C (b) the average
instruction times are respectively 24.9, 27.2, and 29.4 therefore choose cache A (c) For cache A
no extra bit; one extra bit for caches B and C.
12. (a) Extract_TLB_fields (in vaddress; out tagtlb,indextlb);
tagtlb = vaddress shifted right by 16;
indextlb = (vaddress shifted left by 16) shifted right by 28);
end;
(b) TLB is defined as:
2-dimensional array TLBtag(0:3,0:15) where each element is a 16-bit tag
2-dimensional array PTE(0:3,0:15) where each element is a 20 bit physical frame number
Check_TLB(in tagtlb, indextlb; out TLBhit, physical_frame_number)
TLBhit = false;
For (i = 0; i<4; i++)
If TLB[i,indextlb] = tagtlb then
begin
TLBhit = true;
physical_frame_number = PTE[i,indextlb];
exit;
end
end;
In the worst case, there are 4 comparisons. The theoretical minimum is 0 and the
theoretical maximum is 220 -1. In the case where the first 16KB are reserved for the O.S., the
minimum is 4. If there are 256 MB of main memory, the maximum is 218- 1.
(c) Get_physical_address (in vaddress, physical_frame_number; out physical_address);
offset = (vaddress left shift 20) right shift 20;
physical_address = concatenate(physical_frame_number,offset);
end;
(d) Cache is defined as:
2-dimensional array Cachetag(0:1,0:127) where each element is a 20-bit tag
2-dimensional array Cachedata(0:1,0:127) where each element is a 32-byte line

, Extract_cache_fields (in physical_address; out tagcache, indexcache)
tagcache= physical_ address right shift 12;
indexcache = (physical_address left shift 20) right shift 25;
end;
(e) 3.965

13. (a) false (b) false
14. (a) Data_TLB hit followed by Data_Cache miss is what happens generally on a cache miss when
the page has already been referenced. (b)Data_TLB miss followed by Data_Cache miss could
happen after a page fault. It could also happen if the page is sill resident in memory and has not
been accessed for a long time. (c) Data_TLB miss followed by a Data_Cache hit is possible but
unlikely. It could happen if the page has not been referenced for a long time, is still resident in
main memory, and the cache line has not been evicted.
15. (a) 7.445 (b) 5.745

, Chapter III
1. (a) A floating-point add followed by an unrelated integer add (b) A load followed by an
unrelated integer add
2. Instructions leave the front-end in program order only when their operands (registers) are ready or
will become ready through forwarding. A subsequent instruction cannot therefore overwrite a
register operand before its contents have been delivered to the functional unit for which it is an
operand.
3. This could happen if the two writes were not separated by an instruction using the result of the
first write. The first write is therefore not useful. However since it might complete after the
second one, a check is needed.
4. The scoreboard is used to: (a) Allow instructions to pass front the front-end to the back-end by
checking RAW and WAW dependencies (b) Schedule forwarding (c) Recover from branch
misprediction (d) Replay L1 D-cache misses (e) Allow precise exceptions for integer and
memory units.
The first two uses are slight extensions of what the forwarding and stalling units do in simple
pipelines. Because the functional units are pipelined and because instructions are issued in strict
program order, this part of the scoreboard is simpler than that of the CDC 6600 although
forwarding implies some added complexity. The last 3 uses require that the scoreboard monitor
all in flight instructions but this was true also of the CDC.
5. No it is not possible (unless much complexity is added) because a mispredicted branch might be
in the back-end and the instruction in S3 might be on the wrong path.
6. PAL code accesses hardware registers that are not part of process state. Saving and recovering
these registers is not part of the hardware definition.
7. A Boolean shift register is used to model the occupancy of the resulting bus. Assume we are at
time t. The leftmost bit of the shift register shows the occupancy of the bus at time t+1 (true if the
bus will be occupied, false otherwise). Similalrly, bits2 (from the left), 3,..., n show the
occupancies at time t+2, t+3, …, t+n. The shift register will shift left at every cycle. When at
time t an instruction is ready to be issued, the occupancy of the bus at time t +lat (where lat is the
latency of the operation) is checked. If the resulting bit in the shift register is false, it is set to true
and the instruction can issue. If not, the operation is repeated on the next cycle. (Such a schme
was present in the Cray-1 supercomputer).
8. Dispatch Start exec. Finish exec. Write back
i1 t0 + 1 t0 + 2 t0 + 22 t0 + 23
i2 t0 + 2 t0 + 23 t0 + 27 t0 + 28

, i3 t0 + 3 t0 + 4 t0 + 14 t0 + 24

9. No change for i1 and i2
i3 t0 + 3 t0 + 4 t0 + 14 t0 + 15

10. The issue step now has to check for whether buffers associated with the functional units are free.
The dispatch and execution steps are similar. The write-result is more complex since a functional
unit might have several instructions occupying it. To that effect, the Pi field must be of the form
Ua-n where n is the latency of the unit (number of stages) and an instruction carries that tag n
during its execution. Note that WAW are still forbidden (no register renaming).
11. From [SBP00] in Chapter 5:

12. They grow linearly with m.
13. Instructions must be processed in order in the dispatch step since we access the tail of the ROB.
14. No major difference. Example sequence must use a renaming scheme avoiding WAW, for
example:
i1: R2 R3 / R5
i2: R1 R2 + R4
i3: R2 R6 * R6
i4: R7 R2 * R6

15. In the decode-rename scheme, we must implement the check and priority encoding of Exercise 12
and of course check for the availability of 2 free entries in the ROB. We must double the number

, of ports in the reservation stations and have two common buses for results etc. We should also be
able to commit 2 instructions per cycle to balance the design.
16. The primordial function of the ROB is to assure commit of instructions in program order. This
requirement was not need in the floating-point unit of the IBM System 360/91.
17. One advantage would be to save some space on individual (integer) entries in the ROB. This is
overwhelmingly defeated by the drawbacks: (i) splitting a shared resource into two private ones is
rarely a good allocation strategy (ii) branches would have to be entered in both ROBs (iii)
precise exceptions and interrupts would be harder to handle.
18. Saving of an ROB entry. More decoding bandwidth.
19. Well it’s not RISC but see answer to previous exercise.
20. (a) pred = a< 0;
b.pred =1;
c.pred =2;
d.(not pred) =3;
e.(not pred) =4
(b) 7 (c) 6 (d) 4.6 and 4.4 (e) 6 in both cases. Yes but only if the sequence of code is short.

, Chapter IV
1. (a) 0% for 1-bit and 50% for 2-bit (b) 11001100…1100; no evident programming pattern

taken
^ not taken
strong taken weak taken
TT TN
taken

not taken
taken

not taken strong not taken
weak not taken
NT NN

taken

^ not taken

Generally, this is the initial
state

(c) 101010... programming pattern as in (b).
2. It costs more in storage (50%) and the logic is about the same cost. There is no difference in
performance.

NT
T
NT NT NT
TTT TTN TNN NNN

NT T
T T NT
T NT T

NTT TNT NTN NNT
T NT

T

3. (a) 8 (b) Answers in the form (gpr)(pred) (new gpr) (act) (PHT)
(000)(0)(000)(1)(01,01,01,01,01,01,01,11); misprediction restore GPR to (001)

$20.49

Get access to the full document:

100% satisfaction guarantee

Immediately available after payment

Both online and in PDF

No strings attached

Get to know the seller

TestBanksStuvia

3.9

(285)

Get to know the seller

TestBanksStuvia Chamberlain College Of Nursng

View profile

Sold

2672

Member since

2 year

Number of followers

1194

Documents

1925

Last sold

10 hours ago

TESTBANKS & SOLUTION MANUALS

if in any need of a Test bank and Solution Manual, fell free to Message me or Email donc8246@ gmail . All the best in your Studies

3.9

285 reviews

157

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions

What do I get when I buy this document?

You get a PDF, available immediately after your purchase. The purchased document is accessible anytime, anywhere and indefinitely through your profile.

Satisfaction guarantee: how does it work?

Our satisfaction guarantee ensures that you always find a study document that suits you well. You fill out a form, and our customer service team takes care of the rest.

Who am I buying these notes from?

Stuvia is a marketplace, so you are not buying this document from us, but from seller TestBanksStuvia. Stuvia facilitates payment to the seller.

Will I be stuck with a subscription?

No, you only buy these notes for $20.49. You're not tied to anything after your purchase.

Can Stuvia be trusted?

4.6 stars on Google & Trustpilot (+1000 reviews) 46231 documents were sold in the last 30 days Founded in 2010, the go-to place to buy study notes for 15 years now

Microprocessor Architecture (1st Edition, 2009) – Answers to Selected Exercises – Baer

Written for

Document information

Subjects

Content preview

Get to know the seller

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Didn't get what you expected? Choose another document

Pay as you like, start learning right away

Frequently asked questions

What do I get when I buy this document?

Satisfaction guarantee: how does it work?

Who am I buying these notes from?

Will I be stuck with a subscription?

Can Stuvia be trusted?