Pre-Summer Sale - Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70dumps

MLA-C01 Questions and Answers

Question # 6

A company has deployed a model to predict the churn rate for its games by using Amazon SageMaker Studio. After the model is deployed, the company must monitor the model performance for data drift and inspect the report. Select and order the correct steps from the following list to model monitor actions. Select each step one time. (Select and order THREE.) .

Check the analysis results on the SageMaker Studio console. .

Create a Shapley Additive Explanations (SHAP) baseline for the model by using Amazon SageMaker Clarify.

Schedule an hourly model explainability monitor.

Question # 6

Full Access
Question # 7

An ML engineer is training an XGBoost regression model in Amazon SageMaker AI. The ML engineer conducts several rounds of hyperparameter tuning with random grid search. After these rounds of tuning, the error rate on the test hold-out dataset is much larger than the error rate on the training dataset.

The ML engineer needs to make changes before running the hyperparameter grid search again.

Which changes will improve the model ' s performance? (Select TWO.)

A.

Increase the model complexity by increasing the number of features in the dataset.

B.

Decrease the model complexity by reducing the number of features in the dataset.

C.

Decrease the model complexity by reducing the number of samples in the dataset.

D.

Increase the value of the L2 regularization parameter.

E.

Decrease the value of the L2 regularization parameter.

Full Access
Question # 8

A company must install a custom script on any newly created Amazon SageMaker AI notebook instances.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Create a lifecycle configuration script to install the custom script when a new SageMaker AI notebook is created. Attach the lifecycle configuration to every new SageMaker AI notebook as part of the creation steps.

B.

Create a custom Amazon Elastic Container Registry (Amazon ECR) image that contains the custom script. Push the ECR image to a Docker registry. Attach the Docker image to a SageMaker Studio domain. Select the kernel to run as part of the SageMaker AI notebook.

C.

Create a custom package index repository. Use AWS CodeArtifact to manage the installation of the custom script. Set up AWS PrivateLink endpoints to connect CodeArtifact to the SageMaker AI instance. Install the script.

D.

Store the custom script in Amazon S3. Create an AWS Lambda function to install the custom script on new SageMaker AI notebooks. Configure Amazon EventBridge to invoke the Lambda function when a new SageMaker AI notebook is initialized.

Full Access
Question # 9

A company needs to analyze a large dataset that is stored in Amazon S3 in Apache Parquet format. The company wants to use one-hot encoding for some of the columns.

The company needs a no-code solution to transform the data. The solution must store the transformed data back to the same S3 bucket for model training.

Which solution will meet these requirements?

A.

Configure an AWS Glue DataBrew project that connects to the data. Use the DataBrew interactive interface to create a recipe that performs the one-hot encoding transformation. Create a job to apply the transformation and write the output back to an S3 bucket.

B.

Use Amazon Athena SQL queries to perform the one-hot encoding transformation.

C.

Use an AWS Glue ETL interactive notebook to perform the transformation.

D.

Use Amazon Redshift Spectrum to perform the transformation.

Full Access
Question # 10

A company is developing an ML model by using Amazon SageMaker AI. The company must monitor bias in the model and display the results on a dashboard. An ML engineer creates a bias monitoring job.

How should the ML engineer capture bias metrics to display on the dashboard?

A.

Capture AWS CloudTrail metrics from SageMaker Clarify.

B.

Capture Amazon CloudWatch metrics from SageMaker Clarify.

C.

Capture SageMaker Model Monitor metrics from Amazon EventBridge.

D.

Capture SageMaker Model Monitor metrics from Amazon SNS.

Full Access
Question # 11

A company is training a deep learning model to detect abnormalities in images. The company has limited GPU resources and a large hyperparameter space to explore. The company needs to test different configurations and avoid wasting computation time on poorly performing models that show weak validation accuracy in early epochs.

Which hyperparameter optimization strategy should the company use?

A.

Grid search across all possible combinations

B.

Bayesian optimization with early stopping

C.

Manual tuning of each parameter individually

D.

Exhaustive search without early stopping

Full Access
Question # 12

A company is using Amazon SageMaker AI to develop a credit risk assessment model. During model validation, the company finds that the model achieves 82% accuracy on the validation data. However, the model achieved 99% accuracy on the training data. The company needs to address the model accuracy issue before deployment.

Which solution will meet this requirement?

A.

Add more dense layers to increase model complexity. Implement batch normalization. Use early stopping during training.

B.

Implement dropout layers. Use L1 or L2 regularization. Perform k-fold cross-validation.

C.

Use principal component analysis (PCA) to reduce the feature dimensionality. Decrease model layers. Implement cross-entropy loss functions.

D.

Augment the training dataset. Remove duplicate records from the training dataset. Implement stratified sampling.

Full Access
Question # 13

A hospital is using an ML model to validate x-ray results. The hospital runs a nightly batch inference job. The hospital needs to produce a daily report about model data quality and model performance.

Which solution will meet these requirements?

A.

Schedule a monitoring job in Amazon SageMaker Model Monitor. Generate the monitoring results for the model and data.

B.

Create an Amazon CloudWatch dashboard that includes the metrics for processing steps in the nightly batch inference job. Compare the baseline resource metrics. Share the dashboard link.

C.

Use AWS Glue DataBrew to create a custom recipe job that uses the Numerical Statistics data quality check for the model file. Generate the results.

D.

Create a SageMaker AI pipeline that includes a QualityCheck step to run monitoring jobs. Generate the monitoring results for the model and the data.

Full Access
Question # 14

A company uses Amazon SageMaker AI to create ML models. The data scientists need fine-grained control of ML workflows, DAG visualization, experiment history, and model governance for auditing and compliance.

Which solution will meet these requirements?

A.

Use AWS CodePipeline with SageMaker Studio and SageMaker ML Lineage Tracking.

B.

Use AWS CodePipeline with SageMaker Experiments.

C.

Use SageMaker Pipelines with SageMaker Studio and SageMaker ML Lineage Tracking.

D.

Use SageMaker Pipelines with SageMaker Experiments.

Full Access
Question # 15

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

A.

Increase the temperature parameter and the top_k parameter.

B.

Increase the temperature parameter. Decrease the top_k parameter.

C.

Decrease the temperature parameter. Increase the top_k parameter.

D.

Decrease the temperature parameter and the top_k parameter.

Full Access
Question # 16

A company is using ML to predict the presence of a specific weed in a farmer ' s field. The company is using the Amazon SageMaker linear learner built-in algorithm with a value of multiclass_dassifier for the predictorjype hyperparameter.

What should the company do to MINIMIZE false positives?

A.

Set the value of the weight decay hyperparameter to zero.

B.

Increase the number of training epochs.

C.

Increase the value of the target_precision hyperparameter.

D.

Change the value of the predictorjype hyperparameter to regressor.

Full Access
Question # 17

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

A.

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

D.

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

Full Access
Question # 18

A company has deployed an ML model that detects fraudulent credit card transactions in real time in a banking application. The model uses Amazon SageMaker Asynchronous Inference. Consumers are reporting delays in receiving the inference results.

An ML engineer needs to implement a solution to improve the inference performance. The solution also must provide a notification when a deviation in model quality occurs.

Which solution will meet these requirements?

A.

Use SageMaker real-time inference for inference. Use SageMaker Model Monitor for notifications about model quality.

B.

Use SageMaker batch transform for inference. Use SageMaker Model Monitor for notifications about model quality.

C.

Use SageMaker Serverless Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

D.

Keep using SageMaker Asynchronous Inference for inference. Use SageMaker Inference Recommender for notifications about model quality.

Full Access
Question # 19

A company uses an NFS-based data store to store data for ML training. Linux-based systems access the data store.

The company needs a hybrid system to make the shared data store accessible to on-premises servers and Amazon SageMaker AI notebooks that will consume the data. File locking is required for the data producers.

Which AWS storage solution will meet these requirements?

A.

Use an Amazon S3 bucket to store the data. Use Mountpoint for Amazon S3 to mount the S3 bucket to the on-premises servers and the SageMaker AI notebooks.

B.

Use an Amazon Elastic File System (Amazon EFS) file system to store the data. Mount the file system to the on-premises servers and the SageMaker AI notebooks.

C.

Use an Amazon FSx for Lustre file system to store the data. Mount the file system to the on-premises servers and the SageMaker AI notebooks.

D.

Use an Amazon Elastic Block Store (Amazon EBS) volume to store the data. Mount the volume to the on-premises servers and the SageMaker AI notebooks.

Full Access
Question # 20

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

A.

Use Amazon Made to categorize the sensitive data.

B.

Prepare the data by using AWS Glue DataBrew.

C.

Run an AWS Batch job to change the sensitive data to random values.

D.

Run an Amazon EMR job to change the sensitive data to random values.

Full Access
Question # 21

An ML engineer is working on an ML model to predict the prices of similarly sized homes. The model will base predictions on several features The ML engineer will use the following feature engineering techniques to estimate the prices of the homes:

• Feature splitting

• Logarithmic transformation

• One-hot encoding

• Standardized distribution

Select the correct feature engineering techniques for the following list of features. Each feature engineering technique should be selected one time or not at all (Select three.)

Question # 21

Full Access
Question # 22

A company uses AWS CodePipeline to orchestrate a continuous integration and continuous delivery (CI/CD) pipeline for ML models and applications.

Select and order the steps from the following list to describe a CI/CD process for a successful deployment. Select each step one time. (Select and order FIVE.)

. CodePipeline deploys ML models and applications to production.

· CodePipeline detects code changes and starts to build automatically.

. Human approval is provided after testing is successful.

. The company builds and deploys ML models and applications to staging servers for testing.

. The company commits code changes or new training datasets to a Git repository.

Question # 22

Full Access
Question # 23

An ML engineer needs to use AWS CloudFormation to create an ML model that an Amazon SageMaker endpoint will host.

Which resource should the ML engineer declare in the CloudFormation template to meet this requirement?

A.

AWS::SageMaker::Model

B.

AWS::SageMaker::Endpoint

C.

AWS::SageMaker::NotebookInstance

D.

AWS::SageMaker::Pipeline

Full Access
Question # 24

An ML engineer is setting up a continuous integration and continuous delivery (CI/CD) pipeline for an ML workflow in Amazon SageMaker AI. The pipeline needs to automate model re-training, testing, and deployment whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer wants to track model versions for auditing.

Which solution will meet these requirements?

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and to track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Create an AWS Lambda function to re-train and deploy the model. Use Amazon EventBridge to invoke the Lambda function. Reference the Lambda logs to track model versions.

D.

Use SageMaker AI notebook instances to manually re-train and deploy the model when needed. Reference AWS CloudTrail logs to track model versions.

Full Access
Question # 25

A company is developing an ML model for a customer. The training data is stored in an Amazon S3 bucket in the customer ' s AWS account (Account A). The company runs Amazon SageMaker AI training jobs in a separate AWS account (Account B).

The company defines an S3 bucket policy and an IAM policy to allow reads to the S3 bucket.

Which additional steps will meet the cross-account access requirement?

A.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

B.

Create the S3 bucket policy in Account A. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

C.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account A.

D.

Create the S3 bucket policy in Account B. Attach the IAM policy to an IAM role that SageMaker AI uses in Account B.

Full Access
Question # 26

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company ' s ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

A.

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers ' campaigns.

B.

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

C.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

D.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers ' campaigns.

Full Access
Question # 27

An ML engineer is training an ML model to identify medical patients for disease screening. The tabular dataset for training contains 50,000 patient records: 1,000 with the disease and 49,000 without the disease.

The ML engineer splits the dataset into a training dataset, a validation dataset, and a test dataset.

What should the ML engineer do to transform the data and make the data suitable for training?

A.

Apply principal component analysis (PCA) to oversample the minority class in the training dataset.

B.

Apply Synthetic Minority Oversampling Technique (SMOTE) to generate new synthetic samples of the minority class in the training dataset.

C.

Randomly oversample the majority class in the validation dataset.

D.

Apply k-means clustering to undersample the minority class in the test dataset.

Full Access
Question # 28

A company has AWS Glue data processing jobs that are orchestrated by an AWS Glue workflow. The AWS Glue jobs can run on a schedule or can be launched manually.

The company is developing pipelines in Amazon SageMaker Pipelines for ML model development. The pipelines will use the output of the AWS Glue jobs during the data processing phase of model development. An ML engineer needs to implement a solution that integrates the AWS Glue jobs with the pipelines.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Use AWS Step Functions for orchestration of the pipelines and the AWS Glue jobs.

B.

Use processing steps in SageMaker Pipelines. Configure inputs that point to the Amazon Resource Names (ARNs) of the AWS Glue jobs.

C.

Use Callback steps in SageMaker Pipelines to start the AWS Glue workflow and to stop the pipelines until the AWS Glue jobs finish running.

D.

Use Amazon EventBridge to invoke the pipelines and the AWS Glue jobs in the desired order.

Full Access
Question # 29

A company ' s dataset for prediction analytics contains duplicate records, missing data, and unusually extreme high or low values. The company needs a solution to resolve the data quality issues quickly. The solution must maintain data integrity and have the LEAST operational overhead.

Which solution will meet these requirements?

A.

Use AWS Glue DataBrew to delete duplicate records, fill missing values with medians, and replace extreme values with values in a normal range.

B.

Configure an AWS Glue job to identify records with missing values and extreme measurements and delete them.

C.

Create an Amazon EMR Spark job to replace missing values with zeros and merge duplicate records.

D.

Use Amazon SageMaker Data Wrangler to delete duplicates, apply statistical modeling for missing values, and apply outlier detection algorithms.

Full Access
Question # 30

An ML engineer is building a model to predict house and apartment prices. The model uses three features: Square Meters, Price, and Age of Building. The dataset has 10,000 data rows. The data includes data points for one large mansion and one extremely small apartment.

The ML engineer must perform preprocessing on the dataset to ensure that the model produces accurate predictions for the typical house or apartment.

Which solution will meet these requirements?

A.

Remove the outliers and perform a log transformation on the Square Meters variable.

B.

Keep the outliers and perform normalization on the Square Meters variable.

C.

Remove the outliers and perform one-hot encoding on the Square Meters variable.

D.

Keep the outliers and perform one-hot encoding on the Square Meters variable.

Full Access
Question # 31

A company uses an Amazon SageMaker AI model for real-time inference with auto scaling enabled. During peak usage, new instances launch before existing instances are fully ready, causing inefficiencies and delays.

Which solution will optimize the scaling process without affecting response times?

A.

Change to a multi-model endpoint configuration.

B.

Integrate Amazon API Gateway and AWS Lambda to manage invocations.

C.

Decrease the scale-in cooldown period and increase the maximum instance count.

D.

Increase the cooldown period after scale-out activities.

Full Access
Question # 32

An ML engineer is analyzing a classification dataset before training a model in Amazon SageMaker AI. The ML engineer suspects that the dataset has a significant imbalance between class labels that could lead to biased model predictions. To confirm class imbalance, the ML engineer needs to select an appropriate pre-training bias metric.

Which metric will meet this requirement?

A.

Mean squared error (MSE)

B.

Difference in proportions of labels (DPL)

C.

Silhouette score

D.

Structural similarity index measure (SSIM)

Full Access
Question # 33

An ML engineer needs to deploy a trained model based on a genetic algorithm. Predictions can take several minutes, and requests can include up to 100 MB of data.

Which deployment solution will meet these requirements with the LEAST operational overhead?

A.

Deploy on EC2 Auto Scaling behind an ALB.

B.

Deploy to a SageMaker AI real-time endpoint.

C.

Deploy to a SageMaker AI Asynchronous Inference endpoint.

D.

Deploy to Amazon ECS on EC2.

Full Access
Question # 34

An ML engineer trained an ML model on Amazon SageMaker to detect automobile accidents from dosed-circuit TV footage. The ML engineer used SageMaker Data Wrangler to create a training dataset of images of accidents and non-accidents.

The model performed well during training and validation. However, the model is underperforming in production because of variations in the quality of the images from various cameras.

Which solution will improve the model ' s accuracy in the LEAST amount of time?

A.

Collect more images from all the cameras. Use Data Wrangler to prepare a new training dataset.

B.

Recreate the training dataset by using the Data Wrangler corrupt image transform. Specify the impulse noise option.

C.

Recreate the training dataset by using the Data Wrangler enhance image contrast transform. Specify the Gamma contrast option.

D.

Recreate the training dataset by using the Data Wrangler resize image transform. Crop all images to the same size.

Full Access
Question # 35

A company has built more than 50 models and deployed the models on Amazon SageMaker Al as real-time inference

endpoints. The company needs to reduce the costs of the SageMaker Al inference endpoints. The company used the same

ML framework to build the models. The company ' s customers require low-latency access to the models.

Select and order the correct steps from the following list to reduce the cost of inference and keep latency low. Select each

step one time or not at all. (Select and order FIVE.)

· Create an endpoint configuration that references a multi-model container.

. Create a SageMaker Al model with multi-model endpoints enabled.

. Deploy a real-time inference endpoint by using the endpoint configuration.

. Deploy a serverless inference endpoint configuration by using the endpoint configuration.

· Spread the existing models to multiple different Amazon S3 bucket paths.

. Upload the existing models to the same Amazon S3 bucket path.

. Update the models to use the new endpoint ID. Pass the model IDs to the new endpoint.

Question # 35

Full Access
Question # 36

An ML company wants to monitor and analyze the API calls that its AWS resources make. The company has created an AWS CloudTrail log file that logs to an Amazon S3 bucket. The company has also created an organization in AWS Organizations to manage permissions across accounts.

The company needs to enable log file validation to ensure the integrity of its log files.

Which solution will meet these requirements?

A.

Enable CloudTrail log file integrity validation.

B.

Create a multi-Region trail in CloudTrail.

C.

Create a trail in CloudTrail for the organization.

D.

Enable Amazon CloudWatch Logs delivery.

Full Access
Question # 37

A hospital wants to predict patient outcomes for the coming year An ML engineer must improve several existing ML models that currently perform poorly.

Select the correct regularization method from the following list to improve each model Select each regularization method one time, more than one time, or not at all. (Select THREE.)

• L1 regularization

• L2 regularization

• Early stopping

Question # 37

Full Access
Question # 38

A company needs to ingest data from data sources into Amazon SageMaker Data Wrangler. The data sources are Amazon S3, Amazon Redshift, and Snowflake. The ingested data must always be up to date with the latest changes in the source systems.

Which solution will meet these requirements?

A.

Use direct connections to import data from the data sources into Data Wrangler.

B.

Use cataloged connections to import data from the data sources into Data Wrangler.

C.

Use AWS Glue to extract data from the data sources. Use AWS Glue also to import the data directly into Data Wrangler.

D.

Use AWS Lambda to extract data from the data sources. Use Lambda also to import the data directly into Data Wrangler.

Full Access
Question # 39

A company needs an AWS solution that will automatically create versions of ML models as the models are created. Which solution will meet this requirement?

A.

Amazon Elastic Container Registry (Amazon ECR)

B.

Model packages from Amazon SageMaker Marketplace

C.

Amazon SageMaker ML Lineage Tracking

D.

Amazon SageMaker Model Registry

Full Access
Question # 40

A company has significantly increased the amount of data stored as .csv files in an Amazon S3 bucket. Data transformation scripts and queries are now taking much longer than before.

An ML engineer must implement a solution to optimize the data for query performance with the LEAST operational overhead.

Which solution will meet this requirement?

A.

Configure an AWS Lambda function to split the .csv files into smaller objects.

B.

Configure an AWS Glue job to drop string-type columns and save the results to S3.

C.

Configure an AWS Glue ETL job to convert the .csv files to Apache Parquet format.

D.

Configure an Amazon EMR cluster to process the data in S3.

Full Access
Question # 41

A company wants to use Amazon SageMaker AI to host an ML model that runs on CPU for real-time predictions. The model has intermittent traffic during business hours and periods of no traffic after business hours.

Which hosting option will serve inference requests in the MOST cost-effective manner?

A.

Deploy the model to a real-time endpoint with scheduled auto scaling.

B.

Deploy the model to a SageMaker AI Serverless Inference endpoint with provisioned concurrency during business hours.

C.

Deploy the model to an asynchronous inference endpoint with auto scaling to zero.

D.

Deploy the model to a real-time endpoint and activate it only during business hours using AWS Lambda.

Full Access
Question # 42

An ML engineer needs to use Amazon SageMaker Feature Store to create and manage features to train a model.

Select and order the steps from the following list to create and use the features in Feature Store. Each step should be selected one time. (Select and order three.)

• Access the store to build datasets for training.

• Create a feature group.

• Ingest the records.

Question # 42

Full Access
Question # 43

A company is using an Amazon Redshift database as its single data source. Some of the data is sensitive.

A data scientist needs to use some of the sensitive data from the database. An ML engineer must give the data scientist access to the data without transforming the source data and without storing anonymized data in the database.

Which solution will meet these requirements with the LEAST implementation effort?

A.

Configure dynamic data masking policies to control how sensitive data is shared with the data scientist at query time.

B.

Create a materialized view with masking logic on top of the database. Grant the necessary read permissions to the data scientist.

C.

Unload the Amazon Redshift data to Amazon S3. Use Amazon Athena to create schema-on-read with masking logic. Share the view with the data scientist.

D.

Unload the Amazon Redshift data to Amazon S3. Create an AWS Glue job to anonymize the data. Share the dataset with the data scientist.

Full Access
Question # 44

An ML engineer needs to organize a large set of text documents into topics. The ML engineer will not know what the topics are in advance. The ML engineer wants to use built-in algorithms or pre-trained models available through Amazon SageMaker AI to process the documents.

Which solution will meet these requirements?

A.

Use the BlazingText algorithm to identify the relevant text and to create a set of topics based on the documents.

B.

Use the Sequence-to-Sequence algorithm to summarize the text and to create a set of topics based on the documents.

C.

Use the Object2Vec algorithm to create embeddings and to create a set of topics based on the embeddings.

D.

Use the Latent Dirichlet Allocation (LDA) algorithm to process the documents and to create a set of topics based on the documents.

Full Access
Question # 45

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company ' s internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

A.

Use Amazon Kendra Experience Builder.

B.

Use Amazon Aurora PostgreSQL with the pgvector extension.

C.

Use Amazon RDS for PostgreSQL with the pgvector extension.

D.

Use the AWS Glue Data Catalog metadata repository.

Full Access
Question # 46

A company has a Retrieval Augmented Generation (RAG) application that uses a vector database to store embeddings of documents. The company must migrate the application to AWS and must implement a solution that provides semantic search of text files. The company has already migrated the text repository to an Amazon S3 bucket.

Which solution will meet these requirements?

A.

Use an AWS Batch job to process the files and generate embeddings. Use AWS Glue to store the embeddings. Use SQL queries to perform the semantic searches.

B.

Use a custom Amazon SageMaker notebook to run a custom script to generate embeddings. Use SageMaker Feature Store to store the embeddings. Use SQL queries to perform the semantic searches.

C.

Use the Amazon Kendra S3 connector to ingest the documents from the S3 bucket into Amazon Kendra. Query Amazon Kendra to perform the semantic searches.

D.

Use an Amazon Textract asynchronous job to ingest the documents from the S3 bucket. Query Amazon Textract to perform the semantic searches.

Full Access
Question # 47

A company runs an Amazon SageMaker AI domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker AI domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network ACL for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Full Access
Question # 48

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company must implement a manual approval-based workflow to ensure that only approved models can be deployed to production endpoints.

Which solution will meet this requirement?

A.

Use SageMaker Experiments to facilitate the approval process during model registration.

B.

Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

C.

Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

D.

Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to " Approved. "

Full Access
Question # 49

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

A.

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Full Access
Question # 50

An ML engineer is tuning an image classification model that performs poorly on one of two classes. The poorly performing class represents an extremely small fraction of the training dataset.

Which solution will improve the model’s performance?

A.

Optimize for accuracy. Use image augmentation on the less common images.

B.

Optimize for F1 score. Use image augmentation on the less common images.

C.

Optimize for accuracy. Use SMOTE to generate synthetic images.

D.

Optimize for F1 score. Use SMOTE to generate synthetic images.

Full Access
Question # 51

A recommendation model uses ML and calls an Amazon SageMaker AI endpoint to get recommendations. An ML engineer must ensure that the model stays available during an expected increase in user traffic.

Which solution will meet these requirements?

A.

Configure auto scaling on the SageMaker AI endpoint.

B.

Create a new SageMaker AI endpoint. Deploy the model to the new endpoint.

C.

Use SageMaker Neo to optimize the model for inference.

D.

Attach an Auto Scaling group to the SageMaker AI endpoint.

Full Access
Question # 52

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

A.

Lack of training data

B.

Drift in production data distribution

C.

Compute resource constraints

D.

Model overfitting

Full Access
Question # 53

An ML engineer wants to run a training job on Amazon SageMaker AI. The training job will train a neural network by using multiple GPUs. The training dataset is stored in Parquet format.

The ML engineer discovered that the Parquet dataset contains files too large to fit into the memory of the SageMaker AI training instances.

Which solution will fix the memory problem?

A.

Attach an Amazon Elastic Block Store (Amazon EBS) Provisioned IOPS SSD volume to the instance. Store the files in the EBS volume.

B.

Repartition the Parquet files by using Apache Spark on Amazon EMR. Use the repartitioned files for the training job.

C.

Change the instance type to Memory Optimized instances with sufficient memory for the training job.

D.

Use the SageMaker AI distributed data parallelism (SMDDP) library with multiple instances to split the memory usage.

Full Access
Question # 54

An ML engineer is using an Amazon SageMaker AI shadow test to evaluate a new model that is hosted on a SageMaker AI endpoint. The shadow test requires significant GPU resources for high performance. The production variant currently runs on a less powerful instance type.

The ML engineer needs to configure the shadow test to use a higher performance instance type for a shadow variant. The solution must not affect the instance type of the production variant.

Which solution will meet these requirements?

A.

Modify the existing ProductionVariant configuration in the endpoint to include a ShadowProductionVariants list. Specify the larger instance type for the shadow variant.

B.

Create a new endpoint configuration with two ProductionVariant definitions. Configure one definition for the existing production variant and one definition for the shadow variant with the larger instance type. Use the UpdateEndpoint action to apply the new configuration.

C.

Create a separate SageMaker AI endpoint for the shadow variant that uses the larger instance type. Create an AWS Lambda function that routes a portion of the traffic to the shadow endpoint. Assign the Lambda function to the original endpoint.

D.

Use the CreateEndpointConfig action to define a new configuration. Specify the existing production variant in the configuration and add a separate ShadowProductionVariants list. Specify the larger instance type for the shadow variant. Use the CreateEndpoint action and pass the new configuration to the endpoint.

Full Access
Question # 55

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

A.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

B.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

C.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

D.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Full Access
Question # 56

An ML engineer wants to use Amazon SageMaker Data Wrangler to perform preprocessing on a dataset. The ML engineer wants to use the processed dataset to train a classification model. During preprocessing, the ML engineer notices that a text feature has a range of thousands of values that differ only by spelling errors. The ML engineer needs to apply an encoding method so that after preprocessing is complete, the text feature can be used to train the model.

Which solution will meet these requirements?

A.

Perform ordinal encoding to represent categories of the feature.

B.

Perform similarity encoding to represent categories of the feature.

C.

Perform one-hot encoding to represent categories of the feature.

D.

Perform target encoding to represent categories of the feature.

Full Access
Question # 57

A company uses Amazon SageMakerAI to support ML workflows such as model training and deployment.

Select the correct registry from the following list to meet the requirements for each use case with the LEAST operational overhead. Each registry should be selected one or more times. (Select FOUR.)

• Amazon Elastic Container Registry (Amazon ECR)

• SageMaker Model Registry

Question # 57

Full Access
Question # 58

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of the data quality of the models. The ML engineer must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and to send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and to send alerts.

C.

Deploy the models by using Amazon Elastic Container Service (Amazon ECS) on AWS Fargate. Use Amazon EventBridge to monitor the data quality and to send alerts.

D.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and to send alerts.

Full Access
Question # 59

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

A.

Use SageMaker AI file mode to load and process the images in batches.

B.

Reduce the batch size of the model and increase the number of pre-processing threads.

C.

Reduce the quality of the training images in the S3 bucket.

D.

Convert the images into RecordIO format and use the lazy loading pattern.

Full Access
Question # 60

A government agency is conducting a national census to assess program needs by area and city. The census form collects approximately 500 responses from each citizen. The agency needs to analyze the data to extract meaningful insights. The agency wants to reduce the dimensions of the high-dimensional data to uncover hidden patterns.

Which solution will meet these requirements?

A.

Use the principal component analysis (PCA) algorithm in Amazon SageMaker AI.

B.

Use the t-Distributed Stochastic Neighbor Embedding (t-SNE) algorithm in Amazon SageMaker AI.

C.

Use the k-means algorithm in Amazon SageMaker AI.

D.

Use the Random Cut Forest (RCF) algorithm in Amazon SageMaker AI.

Full Access
Question # 61

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

A.

Create code to evaluate each instance ' s memory and compute usage.

B.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

C.

Check AWS CloudTrail event history for the creation of the resources.

D.

Run AWS Compute Optimizer.

Full Access
Question # 62

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

A.

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

B.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

C.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

D.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Full Access
Question # 63

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Ingest real-time data into Amazon Kinesis data streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

B.

Ingest real-time data into Amazon Kinesis data streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

C.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

D.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Full Access
Question # 64

An ML engineer is preparing a dataset that contains medical records to train an ML model to predict the likelihood of patients developing diseases.

The dataset contains columns for patient ID, age, medical conditions, test results, and a " Disease " target column.

How should the ML engineer configure the data to train the model?

A.

Remove the patient ID column.

B.

Remove the age column.

C.

Remove the medical conditions and test results columns.

D.

Remove the " Disease " target column.

Full Access
Question # 65

A company ' s ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model ' s predictions?

A.

Use SageMaker Model Monitor on the deployed model.

B.

Use SageMaker Clarify on the deployed model.

C.

Show the distribution of inferences from A/В testing in Amazon CloudWatch.

D.

Add a shadow endpoint. Analyze prediction differences on samples.

Full Access
Question # 66

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

A.

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

B.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

C.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

D.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Full Access
Question # 67

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model ' s performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model ' s final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Question # 67

Full Access
Question # 68

An ML engineer needs to deploy ML models to get inferences from large datasets in an asynchronous manner. The ML engineer also needs to implement scheduled monitoring of data quality for the models and must receive alerts when changes in data quality occur.

Which solution will meet these requirements?

A.

Deploy the models by using scheduled AWS Glue jobs. Use Amazon CloudWatch alarms to monitor the data quality and send alerts.

B.

Deploy the models by using scheduled AWS Batch jobs. Use AWS CloudTrail to monitor the data quality and send alerts.

C.

Deploy the models by using Amazon ECS on AWS Fargate. Use Amazon EventBridge to monitor the data quality and send alerts.

D.

Deploy the models by using Amazon SageMaker AI batch transform. Use SageMaker Model Monitor to monitor the data quality and send alerts.

Full Access
Question # 69

A company has an ML model that is deployed to an Amazon SageMaker AI endpoint for real-time inference. The company needs to deploy a new model. The company must compare the new model’s performance to the currently deployed model ' s performance before shifting all traffic to the new model.

Which solution will meet these requirements with the LEAST operational effort?

A.

Deploy the new model to a separate endpoint. Manually split traffic between the two endpoints.

B.

Deploy the new model to a separate endpoint. Use Amazon CloudFront to distribute traffic between the two endpoints.

C.

Deploy the new model as a shadow variant on the same endpoint as the current model. Route a portion of live traffic to the shadow model for evaluation.

D.

Use AWS Lambda functions with custom logic to route traffic between the current model and the new model.

Full Access
Question # 70

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a re-training job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

A.

Use an AWS Glue crawler and an AWS Glue extract, transform, and load (ETL) job to detect data drift. Use AWS Glue triggers to automate the re-training job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the re-training job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the re-training job.

D.

Use Amazon Quick Suite (previously known as Amazon QuickSight) anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the re-training job.

Full Access
Question # 71

An ML engineer is using AWS CodeDeploy to deploy new container versions for inference on Amazon ECS.

The deployment must shift 10% of traffic initially, and the remaining 90% must shift within 10–15 minutes.

Which deployment configuration meets these requirements?

A.

CodeDeployDefault.LambdaLinear10PercentEvery10Minutes

B.

CodeDeployDefault.ECSAllAtOnce

C.

CodeDeployDefault.ECSCanary10Percent15Minutes

D.

CodeDeployDefault.LambdaCanary10Percent15Minutes

Full Access
Question # 72

A retail company is analyzing customer purchase data to develop personalized product recommendations. The company wants to use Amazon SageMaker Clarify to assess fairness metrics across different customer groups to avoid potential bias in the recommendation system.

The recommendation system needs to identify if certain customer segments are underrepresented in the training data. The company needs to choose a pre-training bias metric in SageMaker Clarify.

Which metric meets these requirements?

A.

Prediction distribution skew

B.

Feature attribution bias

C.

Class imbalance ratio

D.

Model performance gap

Full Access