complete solutions 2024/2025
A Data Scientist observes oscillations in training accuracy while doing mini-batch
training on a neural network for a classification task.Which of the following is the
MOST LIKELY CAUSE of this problem?
A. The class distribution in the dataset is imbalanced.
B. Dataset shuffling is disabled.
C. The batch size is too big.
D. The learning rate is very high. - ANSWER-D. The learning rate is very high.
Which common parameters MUST be given when submitting Amazon SageMaker
training tasks that use one of the built-in algorithms? (Select three.)
A. The training channel identifying the location of training data on an Amazon S3
bucket.
B. The validation channel identifying the location of validation data on an Amazon
S3 bucket.
C. The IAM role that Amazon SageMaker can assume to perform tasks on behalf
of the users.
D. Hyperparameters in a JSON array as documented for the algorithm used.
E. The Amazon EC2 instance class specifying whether training will be run using
CPU or GPU.
F. The output path specifying where on an Amazon S3 bucket the trained model
will persist. - ANSWER-C. The IAM role that Amazon SageMaker can assume to
perform tasks on behalf of the users.
E. The Amazon EC2 instance class specifying whether training will be run using
CPU or GPU.
F. The output path specifying where on an Amazon S3 bucket the trained model
will persist.
,What is the real class frequency for Romance and the anticipated class frequency
for Adventure given the following confusion matrix for a movie classification
model?
A. The true class frequency for Romance is 77.56% and the predicted class
frequency for Adventure is 20.85%
B. The true class frequency for Romance is 57.92% and the predicted class
frequency for Adventure is 13.12%
C. The true class frequency for Romance is 0.78 and the predicted class
frequency for Adventure is (0.47-0.32)
D. The true class frequency for Romance is 77.56% * 0.78 and the predicted class
frequency for Adventure is 20.85%*0.32 - ANSWER-B. The true class frequency
for Romance is 57.92% and the predicted class frequency for Adventure is 13.12%
A retail chain has been utilizing Amazon Kinesis Data Firehose to ingest
purchase details from its network of 20,000 outlets into Amazon S3. To facilitate
the training of a more advanced machine learning model, training data will need
additional but straightforward transformations, and certain characteristics will be
merged. Daily retraining of the model is required.
Which update will take the LEAST amount of development work, given the vast
number of stores and historical data ingestion?
A. Require that the stores to switch to capturing their data locally on AWS
Storage Gateway for loading into Amazon S3, then use AWS Glue to do the
transformation.
B. Deploy an Amazon EMR cluster running Apache Spark with the transformation
logic, and have the cluster run each day on the accumulating records in Amazon
S3, outputting new/transformed records to Amazon S3.
C. Spin up a fleet of Amazon EC2 instances with the transformat - ANSWER-D.
Insert an Amazon Kinesis Data Analytics stream downstream of the Kinesis Data
Firehose stream that transforms raw record attributes into simple transformed
values using SQL.
A data scientist conducts data exploration and analysis using an Amazon
SageMaker notebook instance. This involves installing some Python packages on
the notebook instance that are not natively accessible on Amazon SageMaker.
How can a machine learning professional guarantee that the data scientist's
essential packages are automatically accessible on the notebook instance?
A. Install AWS Systems Manager Agent on the underlying Amazon EC2 instance
and use Systems Manager Automation to execute the package installation
commands.
, B. Create a Jupyter notebook file (.ipynb) with cells containing the package
installation commands to execute and place the file under the /etc/init directory of
each Amazon SageMaker notebook instance.
C. Use the conda package manager from within the Jupyter notebook console to
apply the necessary conda packages to the default kernel of the notebook.
D. Create an Amazon SageMaker lifecycle conf - ANSWER-D. Create an Amazon
SageMaker lifecycle configuration with package installation commands and
assign the lifecycle configuration to the notebook instance.
A web-based business wishes to increase conversions on its landing page. The
business developed a multi-class deep learning network algorithm using Amazon
SageMaker regularly using a big historical dataset of client visits. However, there
is an overfitting issue: training data indicates a prediction accuracy of 90%,
whereas test data indicates only a prediction accuracy of 70%.
The organization has to increase the generalizability of its model prior to putting
it in production in order to optimize visit-to-purchase conversions.
Which activity is advised to ensure that the company's test and validation data is
modelled with the HIGHEST degree of accuracy possible?
A. Increase the randomization of training data in the mini-batches used in training
B. Allocate a higher proportion of the overall data to the training dataset
C. Apply L1 or L2 regularization and dropouts to the training
D. Reduce the number of layers and u - ANSWER-C. Apply L1 or L2 regularization
and dropouts to the training
A business evaluates the risk variables associated with a specific energy sector
using a long short-term memory (LSTM) model. The program analyzes multi-page
text documents and categorizes each phrase as either posing a danger or posing
no risk. The model is underperforming, despite the Data Scientist's extensive
experimentation with several network architectures and tuning of the associated
hyperparameters.Which technique will result in the MAXIMUM increase in
performance?
A. Initialize the words by term frequency-inverse document frequency (TF-IDF)
vectors pretrained on a large collection of news articles related to the energy
sector.
B. Use gated recurrent units (GRUs) instead of LSTM and run the training process
until the validation loss stops decreasing.
C. Reduce the learning rate and run the training process until the training loss
stops decreasing.
D. Initialize the words by word2vec embeddings pretrained on - ANSWER-D.
Initialize the words by word2vec embeddings pretrained on a large collection of
news articles related to the energy sector.