100% satisfaction guarantee Immediately available after payment Both online and in PDF No strings attached 4.2 TrustPilot
logo-home
Exam (elaborations)

LLM Quiz Questions with Correct Answers 100% Pass

Rating
-
Sold
-
Pages
20
Grade
A+
Uploaded on
25-04-2025
Written in
2024/2025

LLM Quiz Questions with Correct Answers 100% Pass Which in-context learning method involves creating an initial prompt that states the task to be completed and includes a single example question with answer followed by a second question to be answered by the LLM? a. Hot shot b. Zero shot c. Few shot d. One shot - d. One Shot One shot inference involves providing an example question with answer followed by a second question to be answered by the LLM. Few shot inference provides multiple example prompts and answers while zero shot provides only one prompt to be answered by the LLM. Which configuration parameter for inference can be adjusted to either increase or decrease randomness within the model output layer? a. Max new tokens b. Top-k sampling 2COPYRIGHT © 2025 BY EMILLY CHARLOTTE, ALL RIGHTS RESERVED c. Temperature d. Number of beams & beam search - c. Temperature Temperature is used to affect the randomness of the output of the softmax layer. A lower temperature results in reduced variability while a higher temperature results in increased randomness of the output. Which of the following best describes the role of data parallelism in the context of training Large Language Models (LLMs) with GPUs? a. Data parallelism is used to increase the size of the training data by duplicating it across multiple GPUs. b. Data parallelism refers to a type of storage mechanism where data is stored across multiple GPUs. c. Data parallelism is a technique to reduce the model size so that it can fit into the memory of a single GPU. d. Data parallelism allows for the use of multiple GPUs to process different parts of the same data simultaneously, speeding up training time. SkipSubmit - d. Data parallelism allows for the use of multiple GPUs to process different parts of the same data simultaneously, speeding up training time. 3COPYRIGHT © 2025 BY EMILLY CHARLOTTE, ALL RIGHTS RESERVED Data parallelism is a strategy that splits the training data across multiple GPUs. Each GPU processes a different subset of the data simultaneously, which can greatly speed up the overall training time. Which of the following statements about pretraining scaling laws are correct? Select all that apply: a. To scale our model, we need to jointly increase dataset size and model size, or they can become a bottleneck for each other. b. There is a relationship between model size (in number of parameters) and the optimal number o

Show more Read less
Institution
LLM
Course
LLM










Whoops! We can’t load your doc right now. Try again or contact support.

Written for

Institution
LLM
Course
LLM

Document information

Uploaded on
April 25, 2025
Number of pages
20
Written in
2024/2025
Type
Exam (elaborations)
Contains
Questions & answers

Subjects

Content preview

LLM Quiz Questions with Correct
Answers 100% Pass


Which in-context learning method involves creating an initial prompt that states the
task to be completed and includes a single example question with answer followed
by a second question to be answered by the LLM?




a. Hot shot

b. Zero shot

c. Few shot

d. One shot - ✔✔d. One Shot




One shot inference involves providing an example question with answer followed
by a second question to be answered by the LLM. Few shot inference provides
multiple example prompts and answers while zero shot provides only one prompt
to be answered by the LLM.

Which configuration parameter for inference can be adjusted to either increase or
decrease randomness within the model output layer?




a. Max new tokens

b. Top-k sampling




COPYRIGHT © 2025 BY EMILLY CHARLOTTE, ALL RIGHTS RESERVED 1

,c. Temperature

d. Number of beams & beam search - ✔✔c. Temperature




Temperature is used to affect the randomness of the output of the softmax layer. A
lower temperature results in reduced variability while a higher temperature results
in increased randomness of the output.

Which of the following best describes the role of data parallelism in the context of
training Large Language Models (LLMs) with GPUs?




a. Data parallelism is used to increase the size of the training data by duplicating it
across multiple GPUs.




b. Data parallelism refers to a type of storage mechanism where data is stored across
multiple GPUs.




c. Data parallelism is a technique to reduce the model size so that it can fit into the
memory of a single GPU.




d. Data parallelism allows for the use of multiple GPUs to process different parts of
the same data simultaneously, speeding up training time.

SkipSubmit - ✔✔d. Data parallelism allows for the use of multiple GPUs to process
different parts of the same data simultaneously, speeding up training time.




COPYRIGHT © 2025 BY EMILLY CHARLOTTE, ALL RIGHTS RESERVED 2

, Data parallelism is a strategy that splits the training data across multiple GPUs. Each
GPU processes a different subset of the data simultaneously, which can greatly
speed up the overall training time.

Which of the following statements about pretraining scaling laws are correct? Select
all that apply:




a. To scale our model, we need to jointly increase dataset size and model size, or they
can become a bottleneck for each other.




b. There is a relationship between model size (in number of parameters) and the
optimal number of tokens to train the model with.




c. When measuring compute budget, we can use "PetaFlops per second-Day" as a
metric.




d. You should always follow the recommended number of tokens, based on the
chinchilla laws, to train your model. - ✔✔a, b & c




a. To scale our model, we need to jointly increase dataset size and model size, or they
can become a bottleneck for each other.

Correct

For instance, while increasing dataset size is helpful, if we do not jointly improve the
model size, it might not be able to capture value from the larger dataset.




b. There is a relationship between model size (in number of parameters) and the
optimal number of tokens to train the model with.


COPYRIGHT © 2025 BY EMILLY CHARLOTTE, ALL RIGHTS RESERVED 3

Get to know the seller

Seller avatar
Reputation scores are based on the amount of documents a seller has sold for a fee and the reviews they have received for those documents. There are three levels: Bronze, Silver and Gold. The better the reputation, the more your can rely on the quality of the sellers work.
EmillyCharlotte Howard Community College
View profile
Follow You need to be logged in order to follow users or courses
Sold
98
Member since
1 year
Number of followers
5
Documents
11948
Last sold
6 days ago
ALL-MAJOR.

On this page, you find all documents, package deals, and flashcards offered by seller Emilly Charlotte.

3.5

21 reviews

5
7
4
6
3
2
2
3
1
3

Recently viewed by you

Why students choose Stuvia

Created by fellow students, verified by reviews

Quality you can trust: written by students who passed their tests and reviewed by others who've used these notes.

Didn't get what you expected? Choose another document

No worries! You can instantly pick a different document that better fits what you're looking for.

Pay as you like, start learning right away

No subscription, no commitments. Pay the way you're used to via credit card and download your PDF document instantly.

Student with book image

“Bought, downloaded, and aced it. It really can be that simple.”

Alisha Student

Frequently asked questions