Spring Sale - Special 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70dumps

MLA-C01 Questions and Answers

Question # 6

A company uses Amazon SageMaker for its ML workloads. The company's ML engineer receives a 50 MB Apache Parquet data file to build a fraud detection model. The file includes several correlated columns that are not required.

What should the ML engineer do to drop the unnecessary columns in the file with the LEAST effort?

A.

Download the file to a local workstation. Perform one-hot encoding by using a custom Python script.

B.

Create an Apache Spark job that uses a custom processing script on Amazon EMR.

C.

Create a SageMaker processing job by calling the SageMaker Python SDK.

D.

Create a data flow in SageMaker Data Wrangler. Configure a transform step.

Full Access
Question # 7

A company is building an Amazon SageMaker AI pipeline for an ML model. The pipeline uses distributed processing and training.

An ML engineer needs to encrypt network communication between instances that run distributed jobs. The ML engineer configures the distributed jobs to run in a private VPC.

What should the ML engineer do to meet the encryption requirement?

A.

Enable network isolation.

B.

Configure traffic encryption by using security groups.

C.

Enable inter-container traffic encryption.

D.

Enable VPC flow logs.

Full Access
Question # 8

A company uses Amazon SageMaker Studio to develop an ML model. The company has a single SageMaker Studio domain. An ML engineer needs to implement a solution that provides an automated alert when SageMaker AI compute costs reach a specific threshold.

Which solution will meet these requirements?

A.

Add resource tagging by editing the SageMaker AI user profile in the SageMaker AI domain. Configure AWS Cost Explorer to send an alert when the threshold is reached.

B.

Add resource tagging by editing the SageMaker AI user profile in the SageMaker AI domain. Configure AWS Budgets to send an alert when the threshold is reached.

C.

Add resource tagging by editing each user's IAM profile. Configure AWS Cost Explorer to send an alert when the threshold is reached.

D.

Add resource tagging by editing each user's IAM profile. Configure AWS Budgets to send an alert when the threshold is reached.

Full Access
Question # 9

A company wants to improve its customer retention ML model. The current model has 85% accuracy and a new model shows 87% accuracy in testing. The company wants to validate the new model’s performance in production.

Which solution will meet these requirements?

A.

Deploy the new model for 4 weeks across all production traffic. Monitor performance metrics and validate improvements.

B.

Run A/B testing on both models for 4 weeks. Route 20% of traffic to the new model. Monitor customer retention rates across both variants.

C.

Run both models in parallel for 4 weeks. Analyze offline predictions weekly by using historical customer data analysis.

D.

Implement alternating deployments for 4 weeks between the current model and the new model. Track performance metrics for comparison.

Full Access
Question # 10

A company collects customer data daily and stores it as compressed files in an Amazon S3 bucket partitioned by date. Each month, analysts process the data, check data quality, and upload results to Amazon QuickSight dashboards.

An ML engineer needs to automatically check data quality before the data is sent to QuickSight, with the LEAST operational overhead.

Which solution will meet these requirements?

A.

Run an AWS Glue crawler monthly and use AWS Glue Data Quality rules to check data quality.

B.

Run an AWS Glue crawler and create a custom AWS Glue job with PySpark to evaluate data quality.

C.

Use AWS Lambda with Python scripts triggered by S3 uploads to evaluate data quality.

D.

Send S3 events to Amazon SQS and use Amazon CloudWatch Insights to evaluate data quality.

Full Access
Question # 11

A company is developing an internal cost-estimation tool that uses an ML model in Amazon SageMaker AI. Users upload high-resolution images to the tool.

The model must process each image and predict the cost of the object in the image. The model also must notify the user when processing is complete.

Which solution will meet these requirements?

A.

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

B.

Store the images in an Amazon S3 bucket. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

C.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use batch transform jobs for model inference. Use an Amazon Simple Queue Service (Amazon SQS) queue to notify users.

D.

Store the images in an Amazon Elastic File System (Amazon EFS) file system. Deploy the model on SageMaker AI. Use an asynchronous inference strategy for model inference. Use an Amazon Simple Notification Service (Amazon SNS) topic to notify users.

Full Access
Question # 12

An advertising company uses AWS Lake Formation to manage a data lake. The data lake contains structured data and unstructured data. The company's ML engineers are assigned to specific advertisement campaigns.

The ML engineers must interact with the data through Amazon Athena and by browsing the data directly in an Amazon S3 bucket. The ML engineers must have access to only the resources that are specific to their assigned advertisement campaigns.

Which solution will meet these requirements in the MOST operationally efficient way?

A.

Configure IAM policies on an AWS Glue Data Catalog to restrict access to Athena based on the ML engineers' campaigns.

B.

Store users and campaign information in an Amazon DynamoDB table. Configure DynamoDB Streams to invoke an AWS Lambda function to update S3 bucket policies.

C.

Use Lake Formation to authorize AWS Glue to access the S3 bucket. Configure Lake Formation tags to map ML engineers to their campaigns.

D.

Configure S3 bucket policies to restrict access to the S3 bucket based on the ML engineers' campaigns.

Full Access
Question # 13

A company uses a batching solution to process daily analytics. The company wants to provide near real-time updates, use open-source technology, and avoid managing or scaling infrastructure.

Which solution will meet these requirements?

A.

Create Amazon Managed Streaming for Apache Kafka (Amazon MSK) Serverless clusters.

B.

Create Amazon MSK Provisioned clusters.

C.

Create Amazon Kinesis Data Streams with Application Auto Scaling.

D.

Create self-hosted Apache Flink applications on Amazon EC2.

Full Access
Question # 14

A company regularly receives new training data from the vendor of an ML model. The vendor delivers cleaned and prepared data to the company's Amazon S3 bucket every 3-4 days.

The company has an Amazon SageMaker pipeline to retrain the model. An ML engineer needs to implement a solution to run the pipeline when new data is uploaded to the S3 bucket.

Which solution will meet these requirements with the LEAST operational effort?

A.

Create an S3 Lifecycle rule to transfer the data to the SageMaker training instance and to initiate training.

B.

Create an AWS Lambda function that scans the S3 bucket. Program the Lambda function to initiate the pipeline when new data is uploaded.

C.

Create an Amazon EventBridge rule that has an event pattern that matches the S3 upload. Configure the pipeline as the target of the rule.

D.

Use Amazon Managed Workflows for Apache Airflow (Amazon MWAA) to orchestrate the pipeline when new data is uploaded.

Full Access
Question # 15

A company has an ML model that is deployed to an Amazon SageMaker AI endpoint for real-time inference. The company needs to deploy a new model. The company must compare the new model’s performance to the currently deployed model's performance before shifting all traffic to the new model.

Which solution will meet these requirements with the LEAST operational effort?

A.

Deploy the new model to a separate endpoint. Manually split traffic between the two endpoints.

B.

Deploy the new model to a separate endpoint. Use Amazon CloudFront to distribute traffic between the two endpoints.

C.

Deploy the new model as a shadow variant on the same endpoint as the current model. Route a portion of live traffic to the shadow model for evaluation.

D.

Use AWS Lambda functions with custom logic to route traffic between the current model and the new model.

Full Access
Question # 16

An ML engineer needs to use an ML model to predict the price of apartments in a specific location.

Which metric should the ML engineer use to evaluate the model's performance?

A.

Accuracy

B.

Area Under the ROC Curve (AUC)

C.

F1 score

D.

Mean absolute error (MAE)

Full Access
Question # 17

An ML engineer is setting up a CI/CD pipeline for an ML workflow in Amazon SageMaker AI. The pipeline must automatically retrain, test, and deploy a model whenever new data is uploaded to an Amazon S3 bucket. New data files are approximately 10 GB in size. The ML engineer also needs to track model versions for auditing.

Which solution will meet these requirements?

A.

Use AWS CodePipeline, Amazon S3, and AWS CodeBuild to retrain and deploy the model automatically and track model versions.

B.

Use SageMaker Pipelines with the SageMaker Model Registry to orchestrate model training and version tracking.

C.

Use AWS Lambda and Amazon EventBridge to retrain and deploy the model and track versions via logs.

D.

Manually retrain and deploy the model using SageMaker notebook instances and track versions with AWS CloudTrail.

Full Access
Question # 18

An ML engineer receives datasets that contain missing values, duplicates, and extreme outliers. The ML engineer must consolidate these datasets into a single data frame and must prepare the data for ML.

Which solution will meet these requirements?

A.

Use Amazon SageMaker Data Wrangler to import the datasets and to consolidate them into a single data frame. Use the cleansing and enrichment functionalities to prepare the data.

B.

Use Amazon SageMaker Ground Truth to import the datasets and to consolidate them into a single data frame. Use the human-in-the-loop capability to prepare the data.

C.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon Q Developer to generate code snippets that will prepare the data.

D.

Manually import and merge the datasets. Consolidate the datasets into a single data frame. Use Amazon SageMaker data labeling to prepare the data.

Full Access
Question # 19

A company is building a conversational AI assistant on Amazon Bedrock. The company is using Retrieval Augmented Generation (RAG) to reference the company's internal knowledge base. The AI assistant uses the Anthropic Claude 4 foundation model (FM).

The company needs a solution that uses a vector embedding model, a vector store, and a vector search algorithm.

Which solution will develop the AI assistant with the LEAST development effort?

A.

Use Amazon Kendra Experience Builder.

B.

Use Amazon Aurora PostgreSQL with the pgvector extension.

C.

Use Amazon RDS for PostgreSQL with the pgvector extension.

D.

Use the AWS Glue Data Catalog metadata repository.

Full Access
Question # 20

A company is building a deep learning model on Amazon SageMaker. The company uses a large amount of data as the training dataset. The company needs to optimize the model's hyperparameters to minimize the loss function on the validation dataset.

Which hyperparameter tuning strategy will accomplish this goal with the LEAST computation time?

A.

Hyperbaric!

B.

Grid search

C.

Bayesian optimization

D.

Random search

Full Access
Question # 21

A company has deployed an XGBoost prediction model in production to predict if a customer is likely to cancel a subscription. The company uses Amazon SageMaker Model Monitor to detect deviations in the F1 score.

During a baseline analysis of model quality, the company recorded a threshold for the F1 score. After several months of no change, the model's F1 score decreases significantly.

What could be the reason for the reduced F1 score?

A.

Concept drift occurred in the underlying customer data that was used for predictions.

B.

The model was not sufficiently complex to capture all the patterns in the original baseline data.

C.

The original baseline data had a data quality issue of missing values.

D.

Incorrect ground truth labels were provided to Model Monitor during the calculation of the baseline.

Full Access
Question # 22

A company needs to deploy a custom-trained classification ML model on AWS. The model must make near real-time predictions with low latency and must handle variable request volumes.

Which solution will meet these requirements?

A.

Create an Amazon SageMaker AI batch transform job to process inference requests in batches.

B.

Use Amazon API Gateway to receive prediction requests. Use an Amazon S3 bucket to host and serve the model.

C.

Deploy an Amazon SageMaker AI endpoint. Configure auto scaling for the endpoint.

D.

Launch AWS Deep Learning AMIs (DLAMI) on two Amazon EC2 instances. Run the instances behind an Application Load Balancer.

Full Access
Question # 23

A company needs to host a custom ML model to perform forecast analysis. The forecast analysis will occur with predictable and sustained load during the same 2-hour period every day.

Multiple invocations during the analysis period will require quick responses. The company needs AWS to manage the underlying infrastructure and any auto scaling activities.

Which solution will meet these requirements?

A.

Schedule an Amazon SageMaker batch transform job by using AWS Lambda.

B.

Configure an Auto Scaling group of Amazon EC2 instances to use scheduled scaling.

C.

Use Amazon SageMaker Serverless Inference with provisioned concurrency.

D.

Run the model on an Amazon Elastic Kubernetes Service (Amazon EKS) cluster on Amazon EC2 with pod auto scaling.

Full Access
Question # 24

An ML engineer normalized training data by using min-max normalization in AWS Glue DataBrew. The ML engineer must normalize the production inference data in the same way as the training data before passing the production inference data to the model for predictions.

Which solution will meet this requirement?

A.

Apply statistics from a well-known dataset to normalize the production samples.

B.

Keep the min-max normalization statistics from the training set. Use these values to normalize the production samples.

C.

Calculate a new set of min-max normalization statistics from a batch of production samples. Use these values to normalize all the production samples.

D.

Calculate a new set of min-max normalization statistics from each production sample. Use these values to normalize all the production samples.

Full Access
Question # 25

A company has built more than 50 models and deployed the models on Amazon SageMaker Al as real-time inference

endpoints. The company needs to reduce the costs of the SageMaker Al inference endpoints. The company used the same

ML framework to build the models. The company's customers require low-latency access to the models.

Select and order the correct steps from the following list to reduce the cost of inference and keep latency low. Select each

step one time or not at all. (Select and order FIVE.)

· Create an endpoint configuration that references a multi-model container.

. Create a SageMaker Al model with multi-model endpoints enabled.

. Deploy a real-time inference endpoint by using the endpoint configuration.

. Deploy a serverless inference endpoint configuration by using the endpoint configuration.

· Spread the existing models to multiple different Amazon S3 bucket paths.

. Upload the existing models to the same Amazon S3 bucket path.

. Update the models to use the new endpoint ID. Pass the model IDs to the new endpoint.

Full Access
Question # 26

A company wants to deploy an Amazon SageMaker AI model that can queue requests. The model needs to handle payloads of up to 1 GB that take up to 1 hour to process. The model must return an inference for each request. The model also must scale down when no requests are available to process.

Which inference option will meet these requirements?

A.

Asynchronous inference

B.

Batch transform

C.

Serverless inference

D.

Real-time inference

Full Access
Question # 27

A company is building an enterprise AI platform. The company must catalog models for production, manage model versions, and associate metadata such as training metrics with models. The company needs to eliminate the burden of managing different versions of models.

Which solution will meet these requirements?

A.

Use the Amazon SageMaker Model Registry to catalog the models. Create unique tags for each model version. Create key-value pairs to maintain associated metadata.

B.

Use the Amazon SageMaker Model Registry to catalog the models. Create model groups for each model to manage the model versions and to maintain associated metadata.

C.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model. Use the repositories to catalog the models and to manage model versions and associated metadata.

D.

Create a separate Amazon Elastic Container Registry (Amazon ECR) repository for each model. Create unique tags for each model version. Create key-value pairs to maintain associated metadata.

Full Access
Question # 28

A company is creating an application that will recommend products for customers to purchase. The application will make API calls to Amazon Q Business. The company must ensure that responses from Amazon Q Business do not include the name of the company's main competitor.

Which solution will meet this requirement?

A.

Configure the competitor's name as a blocked phrase in Amazon Q Business.

B.

Configure an Amazon Q Business retriever to exclude the competitor’s name.

C.

Configure an Amazon Kendra retriever for Amazon Q Business to build indexes that exclude the competitor's name.

D.

Configure document attribute boosting in Amazon Q Business to deprioritize the competitor's name.

Full Access
Question # 29

An ML engineer is using an Amazon SageMaker Studio notebook to train a neural network by creating an estimator. The estimator runs a Python training script that uses Distributed Data Parallel (DDP) on a single instance that has more than one GPU.

The ML engineer discovers that the training script is underutilizing GPU resources. The ML engineer must identify the point in the training script where resource utilization can be optimized.

Which solution will meet this requirement?

A.

Use Amazon CloudWatch metrics to create a report that describes GPU utilization over time.

B.

Add SageMaker Profiler annotations to the training script. Run the script and generate a report from the results.

C.

Use AWS CloudTrail to create a report that describes GPU utilization and GPU memory utilization over time.

D.

Create a default monitor in Amazon SageMaker Model Monitor and suggest a baseline. Generate a report based on the constraints and statistics the monitor generates.

Full Access
Question # 30

A company is running ML models on premises by using custom Python scripts and proprietary datasets. The company is using PyTorch. The model building requires unique domain knowledge. The company needs to move the models to AWS.

Which solution will meet these requirements with the LEAST development effort?

A.

Use SageMaker AI built-in algorithms to train the proprietary datasets.

B.

Use SageMaker AI script mode and premade images for ML frameworks.

C.

Build a container on AWS that includes custom packages and a choice of ML frameworks.

D.

Purchase similar production models through AWS Marketplace.

Full Access
Question # 31

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The ML engineer needs to use an Amazon SageMaker built-in algorithm to train the model.

Which algorithm should the ML engineer use to meet this requirement?

A.

LightGBM

B.

Linear learner

C.

К-means clustering

D.

Neural Topic Model (NTM)

Full Access
Question # 32

Case study

An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3.

The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data.

The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model.

Which action will meet this requirement with the LEAST operational overhead?

A.

Use AWS Glue to transform the categorical data into numerical data.

B.

Use AWS Glue to transform the numerical data into categorical data.

C.

Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.

D.

Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.

Full Access
Question # 33

A company runs its ML workflows on an on-premises Kubernetes cluster. The ML workflows include ML services that perform training and inferences for ML models. Each ML service runs from its own standalone Docker image.

The company needs to perform a lift and shift from the on-premises Kubernetes cluster to an Amazon Elastic Kubernetes Service (Amazon EKS) cluster.

Which solution will meet this requirement with the LEAST operational overhead?

A.

Redesign the ML services to be configured in Kubeflow. Deploy the new Kubeflow managed ML services to the EKS cluster.

B.

Upload the Docker images to an Amazon Elastic Container Registry (Amazon ECR) repository. Configure a deployment pipeline to deploy the images to the EKS cluster.

C.

Migrate the training data to an Amazon Redshift cluster. Retrain the models from the migrated training data by using Amazon Redshift ML. Deploy the retrained models to the EKS cluster.

D.

Configure an Amazon SageMaker AI notebook. Retrain the models with the same code. Deploy the retrained models to the EKS cluster.

Full Access
Question # 34

A company is developing a customer support AI assistant by using an Amazon Bedrock Retrieval Augmented Generation (RAG) pipeline. The AI assistant retrieves articles from a knowledge base stored in Amazon S3. The company uses Amazon OpenSearch Service to index the knowledge base. The AI assistant uses an Amazon Bedrock Titan Embeddings model for vector search.

The company wants to improve the relevance of the retrieved articles to improve the quality of the AI assistant's answers.

Which solution will meet these requirements?

A.

Use auto-summarization on the retrieved articles by using Amazon SageMaker JumpStart.

B.

Use a reranker model before passing the articles to the foundation model (FM).

C.

Use Amazon Athena to pre-filter the articles based on metadata before retrieval.

D.

Use Amazon Bedrock Provisioned Throughput to process queries more efficiently.

Full Access
Question # 35

A company wants to share data with a vendor in real time to improve the performance of the vendor's ML models. The vendor needs to ingest the data in a stream. The vendor will use only some of the columns from the streamed data.

Which solution will meet these requirements?

A.

Use AWS Data Exchange to stream the data to an Amazon S3 bucket. Use an Amazon Athena CREATE TABLE AS SELECT (CTAS) query to define relevant columns.

B.

Use Amazon Kinesis Data Streams to ingest the data. Use Amazon Managed Service for Apache Flink as a consumer to extract relevant columns.

C.

Create an Amazon S3 bucket. Configure the S3 bucket policy to allow the vendor to upload data to the S3 bucket. Configure the S3 bucket policy to control which columns are shared.

D.

Use AWS Lake Formation to ingest the data. Use the column-level filtering feature in Lake Formation to extract relevant columns.

Full Access
Question # 36

An ML engineer is using AWS CodeDeploy to deploy new container versions for inference on Amazon ECS.

The deployment must shift 10% of traffic initially, and the remaining 90% must shift within 10–15 minutes.

Which deployment configuration meets these requirements?

A.

CodeDeployDefault.LambdaLinear10PercentEvery10Minutes

B.

CodeDeployDefault.ECSAllAtOnce

C.

CodeDeployDefault.ECSCanary10Percent15Minutes

D.

CodeDeployDefault.LambdaCanary10Percent15Minutes

Full Access
Question # 37

A company wants to reduce the cost of its containerized ML applications. The applications use ML models that run on Amazon EC2 instances, AWS Lambda functions, and an Amazon Elastic Container Service (Amazon ECS) cluster. The EC2 workloads and ECS workloads use Amazon Elastic Block Store (Amazon EBS) volumes to save predictions and artifacts.

An ML engineer must identify resources that are being used inefficiently. The ML engineer also must generate recommendations to reduce the cost of these resources.

Which solution will meet these requirements with the LEAST development effort?

A.

Create code to evaluate each instance's memory and compute usage.

B.

Add cost allocation tags to the resources. Activate the tags in AWS Billing and Cost Management.

C.

Check AWS CloudTrail event history for the creation of the resources.

D.

Run AWS Compute Optimizer.

Full Access
Question # 38

An ML model is deployed in production. The model has performed well and has met its metric thresholds for months.

An ML engineer who is monitoring the model observes a sudden degradation. The performance metrics of the model are now below the thresholds.

What could be the cause of the performance degradation?

A.

Lack of training data

B.

Drift in production data distribution

C.

Compute resource constraints

D.

Model overfitting

Full Access
Question # 39

A financial company receives a high volume of real-time market data streams from an external provider. The streams consist of thousands of JSON records every second.

The company needs to implement a scalable solution on AWS to identify anomalous data points.

Which solution will meet these requirements with the LEAST operational overhead?

A.

Ingest real-time data into Amazon Kinesis Data Streams. Use the built-in RANDOM_CUT_FOREST function in Amazon Managed Service for Apache Flink to process the data streams and to detect data anomalies.

B.

Ingest real-time data into Amazon Kinesis Data Streams. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

C.

Ingest real-time data into Apache Kafka on Amazon EC2 instances. Deploy an Amazon SageMaker AI endpoint for real-time outlier detection. Create an AWS Lambda function to detect anomalies. Use the data streams to invoke the Lambda function.

D.

Send real-time data to an Amazon Simple Queue Service (Amazon SQS) FIFO queue. Create an AWS Lambda function to consume the queue messages. Program the Lambda function to start an AWS Glue extract, transform, and load (ETL) job for batch processing and anomaly detection.

Full Access
Question # 40

A company has a conversational AI assistant that sends requests through Amazon Bedrock to an Anthropic Claude large language model (LLM). Users report that when they ask similar questions multiple times, they sometimes receive different answers. An ML engineer needs to improve the responses to be more consistent and less random.

Which solution will meet these requirements?

A.

Increase the temperature parameter and the top_k parameter.

B.

Increase the temperature parameter. Decrease the top_k parameter.

C.

Decrease the temperature parameter. Increase the top_k parameter.

D.

Decrease the temperature parameter and the top_k parameter.

Full Access
Question # 41

A company wants to develop an ML model by using tabular data from its customers. The data contains meaningful ordered features with sensitive information that should not be discarded. An ML engineer must ensure that the sensitive data is masked before another team starts to build the model.

Which solution will meet these requirements?

A.

Use Amazon Made to categorize the sensitive data.

B.

Prepare the data by using AWS Glue DataBrew.

C.

Run an AWS Batch job to change the sensitive data to random values.

D.

Run an Amazon EMR job to change the sensitive data to random values.

Full Access
Question # 42

A company uses a training job on Amazon SageMaker Al to train a neural network. The job first trains a model and then evaluates the model's performance ag

test dataset. The company uses the results from the evaluation phase to decide if the trained model will go to production.

The training phase takes too long. The company needs solutions that can shorten training time without decreasing the model's final performance.

Select the correct solutions from the following list to meet the requirements for each description. Select each solution one time or not at all. (Select THREE.)

. Change the epoch count.

. Choose an Amazon EC2 Spot Fleet.

· Change the batch size.

. Use early stopping on the training job.

· Use the SageMaker Al distributed data parallelism (SMDDP) library.

. Stop the training job.

Full Access
Question # 43

A company has a team of data scientists who use Amazon SageMaker notebook instances to test ML models. When the data scientists need new permissions, the company attaches the permissions to each individual role that was created during the creation of the SageMaker notebook instance.

The company needs to centralize management of the team's permissions.

Which solution will meet this requirement?

A.

Create a single IAM role that has the necessary permissions. Attach the role to each notebook instance that the team uses.

B.

Create a single IAM group. Add the data scientists to the group. Associate the group with each notebook instance that the team uses.

C.

Create a single IAM user. Attach the AdministratorAccess AWS managed IAM policy to the user. Configure each notebook instance to use the IAM user.

D.

Create a single IAM group. Add the data scientists to the group. Create an IAM role. Attach the AdministratorAccess AWS managed IAM policy to the role. Associate the role with the group. Associate the group with each notebook instance that the team uses.

Full Access
Question # 44

A company is planning to use Amazon SageMaker to make classification ratings that are based on images. The company has 6 ТВ of training data that is stored on an Amazon FSx for NetApp ONTAP system virtual machine (SVM). The SVM is in the same VPC as SageMaker.

An ML engineer must make the training data accessible for ML models that are in the SageMaker environment.

Which solution will meet these requirements?

A.

Mount the FSx for ONTAP file system as a volume to the SageMaker Instance.

B.

Create an Amazon S3 bucket. Use Mountpoint for Amazon S3 to link the S3 bucket to the FSx for ONTAP file system.

C.

Create a catalog connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

D.

Create a direct connection from SageMaker Data Wrangler to the FSx for ONTAP file system.

Full Access
Question # 45

A company's ML engineer has deployed an ML model for sentiment analysis to an Amazon SageMaker AI endpoint. The ML engineer needs to explain to company stakeholders how the model makes predictions.

Which solution will provide an explanation for the model's predictions?

A.

Use SageMaker Model Monitor on the deployed model.

B.

Use SageMaker Clarify on the deployed model.

C.

Show the distribution of inferences from A/B testing in Amazon CloudWatch.

D.

Add a shadow endpoint. Analyze prediction differences on samples.

Full Access
Question # 46

A company is using an Amazon S3 bucket to collect data that will be used for ML workflows. The company needs to use AWS Glue DataBrew to clean and normalize the data.

Which solution will meet these requirements?

A.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew profile job.

B.

Create a DataBrew dataset by using the S3 path. Clean and normalize the data by using a DataBrew recipe job.

C.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a profile job.

D.

Create a DataBrew dataset by using a JDBC driver to connect to the S3 bucket. Use a recipe job.

Full Access
Question # 47

A company is planning to use Amazon Redshift ML in its primary AWS account. The source data is in an Amazon S3 bucket in a secondary account.

An ML engineer needs to set up an ML pipeline in the primary account to access the S3 bucket in the secondary account. The solution must not require public IPv4 addresses.

Which solution will meet these requirements?

A.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create a VPC peering connection between the accounts. Update the VPC route tables to remove the route to 0.0.0.0/0.

B.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC with no public access enabled in the primary account. Create an AWS Direct Connect connection and a transit gateway. Associate the VPCs from both accounts with the transit gateway. Update the VPC route tables to remove the route to 0.0.0.0/0.

C.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an AWS Site-to-Site VPN connection with two encrypted IPsec tunnels between the accounts. Set up interface VPC endpoints for Amazon S3.

D.

Provision a Redshift cluster and Amazon SageMaker Studio in a VPC in the primary account. Create an S3 gateway endpoint. Update the S3 bucket policy to allow IAM principals from the primary account. Set up interface VPC endpoints for SageMaker and Amazon Redshift.

Full Access
Question # 48

A company runs an Amazon SageMaker domain in a public subnet of a newly created VPC. The network is configured properly, and ML engineers can access the SageMaker domain.

Recently, the company discovered suspicious traffic to the domain from a specific IP address. The company needs to block traffic from the specific IP address.

Which update to the network configuration will meet this requirement?

A.

Create a security group inbound rule to deny traffic from the specific IP address. Assign the security group to the domain.

B.

Create a network ACL inbound rule to deny traffic from the specific IP address. Assign the rule to the default network Ad for the subnet where the domain is located.

C.

Create a shadow variant for the domain. Configure SageMaker Inference Recommender to send traffic from the specific IP address to the shadow endpoint.

D.

Create a VPC route table to deny inbound traffic from the specific IP address. Assign the route table to the domain.

Full Access
Question # 49

A company launches a feature that predicts home prices. An ML engineer trained a regression model using the SageMaker AI XGBoost algorithm. The model performs well on training data but underperforms on real-world validation data.

Which solution will improve the validation score with the LEAST implementation effort?

A.

Create a larger training dataset with more real-world data and retrain.

B.

Increase the num_round hyperparameter.

C.

Change the eval_metric from RMSE to Error.

D.

Increase the lambda hyperparameter.

Full Access
Question # 50

A construction company is using Amazon SageMaker AI to train specialized custom object detection models to identify road damage. The company uses images from multiple cameras. The images are stored as JPEG objects in an Amazon S3 bucket.

The images need to be pre-processed by using computationally intensive computer vision techniques before the images can be used in the training job. The company needs to optimize data loading and pre-processing in the training job. The solution cannot affect model performance or increase compute or storage resources.

Which solution will meet these requirements?

A.

Use SageMaker AI file mode to load and process the images in batches.

B.

Reduce the batch size of the model and increase the number of pre-processing threads.

C.

Reduce the quality of the training images in the S3 bucket.

D.

Convert the images into RecordIO format and use the lazy loading pattern.

Full Access
Question # 51

An ML engineer has developed a binary classification model outside of Amazon SageMaker. The ML engineer needs to make the model accessible to a SageMaker Canvas user for additional tuning.

The model artifacts are stored in an Amazon S3 bucket. The ML engineer and the Canvas user are part of the same SageMaker domain.

Which combination of requirements must be met so that the ML engineer can share the model with the Canvas user? (Choose two.)

A.

The ML engineer and the Canvas user must be in separate SageMaker domains.

B.

The Canvas user must have permissions to access the S3 bucket where the model artifacts are stored.

C.

The model must be registered in the SageMaker Model Registry.

D.

The ML engineer must host the model on AWS Marketplace.

E.

The ML engineer must deploy the model to a SageMaker endpoint.

Full Access
Question # 52

A company is building a near real-time data analytics application to detect anomalies and failures for industrial equipment. The company has thousands of IoT sensors that send data every 60 seconds. When new versions of the application are released, the company wants to ensure that application code bugs do not prevent the application from running.

Which solution will meet these requirements?

A.

Use Amazon Managed Service for Apache Flink with the system rollback capability enabled to build the data analytics application.

B.

Use Amazon Managed Service for Apache Flink with manual rollback when an error occurs to build the data analytics application.

C.

Use Amazon Data Firehose to deliver real-time streaming data programmatically for the data analytics application. Pause the stream when a new version of the application is released and resume the stream after the application is deployed.

D.

Use Amazon Data Firehose to deliver data to Amazon EC2 instances across two Availability Zones for the data analytics application.

Full Access
Question # 53

An ML engineer has trained a neural network by using stochastic gradient descent (SGD). The neural network performs poorly on the test set. The values for training loss and validation loss remain high and show an oscillating pattern. The values decrease for a few epochs and then increase for a few epochs before repeating the same cycle.

What should the ML engineer do to improve the training process?

A.

Introduce early stopping.

B.

Increase the size of the test set.

C.

Increase the learning rate.

D.

Decrease the learning rate.

Full Access
Question # 54

A company is uploading thousands of PDF policy documents into Amazon S3 and Amazon Bedrock Knowledge Bases. Each document contains structured sections. Users often search for a small section but need the full section context. The company wants accurate section-level search with automatic context retrieval and minimal custom coding.

Which chunking strategy meets these requirements?

A.

Hierarchical

B.

Maximum tokens

C.

Semantic

D.

Fixed-size

Full Access
Question # 55

A company uses the Amazon SageMaker AI Object2Vec algorithm to train an ML model. The model performs well on training data but underperforms after deployment. The company wants to avoid overfitting the model and maintain the model's ability to generalize.

Which solution will meet these requirements?

A.

Decrease the early_stopping_patience hyperparameter.

B.

Increase the mini_batch_size hyperparameter.

C.

Decrease the dropout rate.

D.

Increase the number of epochs.

Full Access
Question # 56

A company is developing an ML model to predict customer satisfaction. The company needs to use survey feedback and the past satisfaction level of customers to predict the future satisfaction level of customers.

The dataset includes a column named Feedback that contains long text responses. The dataset also includes a column named Satisfaction Level that contains three distinct values for past customer satisfaction: High, Medium, and Low. The company must apply encoding methods to transform the data in each column.

Which solution will meet these requirements?

A.

Apply one-hot encoding to the Feedback column and the Satisfaction Level column.

B.

Apply one-hot encoding to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

C.

Apply label encoding to the Feedback column. Apply binary encoding to the Satisfaction Level column.

D.

Apply tokenization to the Feedback column. Apply ordinal encoding to the Satisfaction Level column.

Full Access
Question # 57

Case Study

A company is building a web-based AI application by using Amazon SageMaker. The application will provide the following capabilities and features: ML experimentation, training, a

central model registry, model deployment, and model monitoring.

The application must ensure secure and isolated use of training data during the ML lifecycle. The training data is stored in Amazon S3.

The company must implement a manual approval-based workflow to ensure that only approved models can be deployed to production endpoints.

Which solution will meet this requirement?

A.

Use SageMaker Experiments to facilitate the approval process during model registration.

B.

Use SageMaker ML Lineage Tracking on the central model registry. Create tracking entities for the approval process.

C.

Use SageMaker Model Monitor to evaluate the performance of the model and to manage the approval.

D.

Use SageMaker Pipelines. When a model version is registered, use the AWS SDK to change the approval status to "Approved."

Full Access
Question # 58

An ML engineer wants to deploy an Amazon SageMaker AI model for inference. The payload sizes are less than 3 MB. Processing time does not exceed 45 seconds. The traffic patterns will be irregular or unpredictable.

Which inference option will meet these requirements MOST cost-effectively?

A.

Asynchronous inference

B.

Real-time inference

C.

Serverless inference

D.

Batch transform

Full Access
Question # 59

An ML engineer is setting up an Amazon SageMaker AI pipeline for an ML model. The pipeline must automatically initiate a retraining job if any data drift is detected.

How should the ML engineer set up the pipeline to meet this requirement?

A.

Use an AWS Glue crawler and an AWS Glue ETL job to detect data drift. Use AWS Glue triggers to automate the retraining job.

B.

Use Amazon Managed Service for Apache Flink to detect data drift. Use an AWS Lambda function to automate the retraining job.

C.

Use SageMaker Model Monitor to detect data drift. Use an AWS Lambda function to automate the retraining job.

D.

Use Amazon QuickSight anomaly detection to detect data drift. Use an AWS Step Functions workflow to automate the retraining job.

Full Access
Question # 60

An ML engineer needs to use an Amazon EMR cluster to process large volumes of data in batches. Any data loss is unacceptable.

Which instance purchasing option will meet these requirements MOST cost-effectively?

A.

Run the primary node, core nodes, and task nodes on On-Demand Instances.

B.

Run the primary node, core nodes, and task nodes on Spot Instances.

C.

Run the primary node on an On-Demand Instance. Run the core nodes and task nodes on Spot Instances.

D.

Run the primary node and core nodes on On-Demand Instances. Run the task nodes on Spot Instances.

Full Access
Question # 61

A company's ML engineer is creating a classification model. The ML engineer explores the dataset and notices a column named day_of_week. The column contains the following values: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, and Sunday.

Which technique should the ML engineer use to convert this column’s data to binary values?

A.

Binary encoding

B.

Label encoding

C.

One-hot encoding

D.

Tokenization

Full Access
Question # 62

A company wants to predict the success of advertising campaigns by considering the color scheme of each advertisement. An ML engineer is preparing data for a neural network model. The dataset includes color information as categorical data.

Which technique for feature engineering should the ML engineer use for the model?

A.

Apply label encoding to the color categories. Automatically assign each color a unique integer.

B.

Implement padding to ensure that all color feature vectors have the same length.

C.

Perform dimensionality reduction on the color categories.

D.

One-hot encode the color categories to transform the color scheme feature into a binary matrix.

Full Access