100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.6 TrustPilot
logo-home
Exam (elaborations)

Multicore and GPU Programming: An Integrated Approach, Second Edition (Suppl. 1 of 3, Instructor Solution Manual, Solutions) Complete A+ Solutions.

Rating
-
Sold
-
Pages
215
Grade
A+
Uploaded on
09-01-2025
Written in
2024/2025

Multicore and GPU Programming: An Integrated Approach, Second Edition (Suppl. 1 of 3, Instructor Solution Manual, Solutions)

Institution
Multicore And GPU Programming 2nd Edition
Course
Multicore And GPU Programming 2nd Edition











Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
Multicore And GPU Programming 2nd Edition
Course
Multicore And GPU Programming 2nd Edition

Document information

Uploaded on
January 9, 2025
Number of pages
215
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

  • solution manual

Content preview

SOLUTION MANUAL

Multicore & GPU Programming : An Integrated
Approach, 2e


Gerassimos Barlas

,Contents

Contents 2

1 Introduction 5

2 Multicore and Parallel Program Design 9

3 Threads and Concurrency in standard C++ 13

4 Parallel data structures 57

5 Distributed memory programming 61

6 GPU Programming 117

7 GPU and Accelerator Programming : OpenCL 143

8 Shared-memory programming : OpenMP 169

9 The Thrust Template Library 183

10 High-level multi-threaded programming with the Qt library 199

11 Load Balancing 205




3

,for more solution manuals,visit Library Genesis: libgen.is, libgen.st, libgen.rs, and forum.mhut.org

Chapter 1

Introduction

Exercises
1. Study one of the top 10 most powerful supercomputers in the world. Dis-
cover:

What kind of operating system does it run?
How many CPUs/GPUs is it made of?
What is its total memory capacity?
What kind of software tools can be used to program it?

Answer
Students should research the answer by visiting the Top 500 site and -if
available- the site of one of the reported systems.

2. How many cores are inside the top GPU offerings from NVidia and AMD?
What is the GFlop rating of these chips?
Answer N/A.

3. The performance of the most powerful supercomputers in the world is
usually reported as two numbers Rpeak and Rmax, both in TFlops (tera
floating point operations per second) units. Why is this done? What
are the factors reducing performance from Rpeak to Rmax? Would it be
possible to ever achieve Rpeak?
Answer
This is done because the peak performance is unattainable. Sustained,
measured performance on specific benchmarks, is a better indicator of the
true machine potential.
The reason these are different is communication overhead.
Rpeak and Rmax could never be equal. Extremely compute-heavy ap-
plications, that have no inter-node communications, could asymptotically
approach Rpeak if they were to run for a very long time. A very long
execution time is required to diminish the influence of the start-up costs.

5

, 6 CHAPTER 1. INTRODUCTION

4. A sequential application with a 20% part that must be executed sequen-
tially, is required to be accelerated five-fold. How many CPUs are required
for this task?
Answer
This requires the application of Amdahl’s law. The part that can be
parallelized is α = 1 −
1
20% = 80%. The speedup predicted by Amdahl’s
law is speedup = α .
1−α+ N
Achieving a three-fold speedup requires that:

1 1 0.8 1 0.8
=3⇒ =3⇒ = − 0.2 ⇒ N = 1 =6
1 − α +N
0.8
α 0.2 + N N 3 3 − 0.2
(1.1)
Achieving a 5-fold speedup requires that:
1 0.8 0.8
0.8 = 5 ⇒ N = 1
= =∞ (1.2)
5 − 0.2
0.2 + N 0
So, it is impossible to achieve a 5-fold speedup, according to Amdahl’s
law.
5. A parallel application running on 5 identical machines, has a 10% sequen-
tial part. What is the speedup relative to a sequential execution on one of
the machines? If we would like to double that speedup, how many CPU
would be required?
Answer
This requires the application of Gustafson-Barsis’ law as the information
relates to a parallel application. The parallel part is α = 1— 10% = 90%.
The speedup over a single machine is speedup — = 1 α+N · α = .1+5· 0.9 =
4.6.
Doubling the speedup would require .1 + ·N 0.9 = 9.2 ⇒ N = 0.9 = 10.1
9.1

machines. As N has to be an integer, we have to round-up to the closest
integer, i.e. N = 11.
6. An application with a 5% non-parallelizable part, is to be modified for
parallel execution. Currently on the market there are two parallel ma-
chines available: machine X with 4 CPUs, each CPU capable of executing
the application in 1hr on its own, and, machine Y with 16 CPUs, with
each CPU capable of executing the application in 2hr on its own. Which
is the machine you should buy, if the minimum execution time is required?
Answer
As the information provided relates to a sequential application, we have
to apply Amdahl’s law. The execution time for machine X is:
αT = 0.05 1hr + 0.95 · 1hr = 0.2875hr (1.3)
tX = (1 − α)T + ∗
N 4
The execution time for machine Y is:
αT = 0.05 2hr + 0.95 · 2hr = 0.21875hr (1.4)
tY = (1 − α)T + ∗
N 16

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
LectArnold Liberty university
View profile
Follow You need to be logged in order to follow users or courses
Sold
270
Member since
1 year
Number of followers
49
Documents
1466
Last sold
1 day ago

3.2

57 reviews

5
20
4
8
3
7
2
7
1
15

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions