AWS Certified Machine Learning - Specialty - (MLS-C01) Exam Questions

333

Total Questions

SEP

2025

Last Updated

1st

1st Try Guaranteed

Experts Verified

Per page:

Question 1 Single Choice

A data science team at your company is planning to utilize Amazon SageMaker to train an XGBoost model to predict customer churn. The dataset comprises millions of rows, necessitating significant pre-processing to ensure model accuracy. To handle this task efficiently, the team has decided to leverage Apache Spark due to its capability for large-scale data processing. As the lead architect, you are tasked with designing a solution that integrates Apache Spark for data pre-processing while optimizing for simplicity and scalability.

What is the simplest architecture that allows the team to pre-process the data at scale using Apache Spark before training the model with XGBoost on SageMaker?

Question 2 Single Choice

Considering that a company uses the built-in PCA algorithm in Amazon SageMaker and stores its training data on Amazon S3, it has observed significant expenses linked to the use of Amazon Elastic Block Store (EBS) volumes with their SageMaker training instances.

Which parameter setting should they adjust in the AlgorithmSpecification to effectively reduce these EBS costs?

Question 3 Single Choice

In Amazon Elastic File System (EFS), when monitoring performance metrics indicates that the IOPS usage is nearing 100%, which of the following actions should be taken to effectively manage the file system's performance?

Click "Show Answer" to see the explanation here

Review EFS performance metrics (IOPS, throughput, connections) and relate them to the question.

Correct Choice: Increase the provisioned throughput of the EFS file system if it is in the provisioned mode.

1. For Bursting Throughput Mode: If `PercentIOLimit` is approaching 100%, increasing the total storage size will automatically raise the baseline performance and burstable IOPS capacity. This option leverages the natural scaling feature of Bursting Throughput mode.

2. For Provisioned Throughput Mode: Alternatively, if the file system is already in Provisioned Throughput mode or if a more immediate and predictable performance enhancement is needed, manually adjust the `ProvisionedThroughput` setting. This direct intervention ensures performance does not degrade as the `PercentIOLimit` approaches its maximum.

Incorrect Choice: Decrease the number of files stored in the EFS file system to reduce IOPS usage.

Reducing the number of files in EFS to decrease IOPS usage is not typically effective because IOPS limits are more directly influenced by the nature of the file operations and the throughput mode, rather than simply the quantity of files. This action might not significantly impact IOPS utilization if the remaining operations continue to be read/write intensive.

Incorrect Choice: Convert the EFS file system from General Purpose to Max I/O performance mode.

Switching from General Purpose to Max I/O performance mode in EFS is intended for file systems that require high levels of aggregate throughput and IOPS across multiple connections, but it does not directly address the issue of nearing the IOPS capacity limit of a system under heavy load. This switch might improve performance under certain circumstances but doesn't directly manage or alleviate reaching the IOPS capacity. Additionally, Max I/O performance mode may introduce higher latencies for file operations.

Incorrect Choice: Reconfigure attached EC2 instances to use Elastic Block Store (EBS) instead of EFS.

Moving from EFS to EBS involves significant architectural changes and is not a direct remediation for high IOPS usage in EFS. EBS provides block-level storage and is used for different types of workloads compared to EFS, which offers file-level storage.

Explanation

Review EFS performance metrics (IOPS, throughput, connections) and relate them to the question.

Correct Choice: Increase the provisioned throughput of the EFS file system if it is in the provisioned mode.

Incorrect Choice: Decrease the number of files stored in the EFS file system to reduce IOPS usage.

Incorrect Choice: Convert the EFS file system from General Purpose to Max I/O performance mode.

Incorrect Choice: Reconfigure attached EC2 instances to use Elastic Block Store (EBS) instead of EFS.

Question 4 Multiple Choice

A machine learning team is building a recommendation system using user clickstream data collected from a popular e-commerce website. The raw data is semi-structured JSON and includes nested fields for session activity, product views, and user metadata. The team wants to process this data daily for feature engineering and store the transformed data in a format that is:

Efficient for analytical queries
Compatible with Amazon SageMaker training jobs
Cost-effective to store at scale

Which of the following solutions would best meet these requirements? (Select TWO)

Question 5 Multiple Choice

In an effort to optimize a machine learning model on Amazon SageMaker, you find that the automatic hyperparameter tuning job is excessively resource-intensive and costly. Which TWO of the following strategies could effectively reduce these costs? (Select TWO)

Question 6 Single Choice

A healthcare company is planning to develop a machine learning model to predict patient readmission rates based on historical patient data. The data science team needs to create a data repository that integrates various types of patient data such as demographics, previous medical history, medication records, and lab test results.

Which strategy should the data engineering team use to identify and organize the primary data sources effectively, ensuring the data is accessible and formatted suitably for training the machine learning model?

Question 7 Single Choice

A data analyst is tasked with performing exploratory data analysis on a dataset of tweets to understand user sentiment towards various topics. The goal is to label tweets accurately for further sentiment analysis. Which AWS service or feature should the analyst use to efficiently categorize and label the dataset, ensuring a solid foundation for subsequent detailed analysis?

Question 8 Single Choice

A leading news portal seeks to deliver personalized article recommendations by daily training a machine learning model using historical clickstream data. The volume of incoming data is consistent but experiences substantial spikes during major elections, leading to increased site traffic. Which architecture would ensure the most cost-effective and reliable framework for accommodating these conditions?

A- Capture clickstream data using Amazon Kinesis Data Firehose to Amazon S3. Process the data with Amazon SageMaker for model training using Managed Spot Training. Publish results to Amazon DynamoDB for instant recommendation serving.

B- Route clickstream data to Amazon Managed Streaming for Apache Kafka (Amazon MSK), then process in real-time with Amazon SageMaker for predictive modeling. Persist model insights in Amazon Aurora for delivering real-time content recommendations.

C- Direct clickstream data to Amazon S3 using Amazon Kinesis Data Firehose, conducting nightly analysis with AWS Glue DataBrew and Amazon SageMaker using On-Demand Instances for model training. Deploy results to DynamoDB for real-time recommendations.

D- Stream clickstream data into Amazon S3 via Amazon Kinesis Data Streams, then use AWS Glue for real-time ETL processing. Utilize Amazon SageMaker for model training, adjusting capacity with Spot Instances as needed. Store outcomes in DynamoDB for live recommendations.

Click "Show Answer" to see the explanation here

Identify the workflow: data ingestion, processing, model training, and result storage. Focus on services that offer scalable, cost-effective solutions for each step, especially considering traffic variability and real-time recommendation requirements.

Correct choice: Capture clickstream data using Amazon Kinesis Data Firehose to Amazon S3. Process the data with Amazon SageMaker for model training using Managed Spot Training. Publish results to Amazon DynamoDB for instant recommendation serving.

This choice efficiently manages high-volume data ingestion (Kinesis Firehose to S3), cost-effective processing and model training (SageMaker with Spot Training), and real-time recommendation serving (DynamoDB), aligning with requirements for scalability and cost efficiency.

Incorrect choice: Stream clickstream data into Amazon S3 via Amazon Kinesis Data Streams, then use AWS Glue for real-time ETL processing. Utilize Amazon SageMaker for model training, adjusting capacity with Spot Instances as needed. Store outcomes in DynamoDB for live recommendations.

While this setup uses AWS services effectively, real-time processing with AWS Glue is less suited for the detailed model training scenario described, which benefits more from batch processing and analysis.

Incorrect choice: Direct clickstream data to Amazon S3 using Amazon Kinesis Data Firehose, conducting nightly analysis with AWS Glue DataBrew and Amazon SageMaker using On-Demand Instances for model training. Deploy results to DynamoDB for real-time recommendations.

This approach is valid but less cost-effective due to the use of On-Demand Instances for model training. Spot Instances or Managed Spot Training offer similar capabilities with better cost management.

Incorrect choice: Route clickstream data to Amazon Managed Streaming for Apache Kafka (Amazon MSK), then process in real-time with Amazon SageMaker for predictive modeling. Persist model insights in Amazon Aurora for delivering real-time content recommendations.

Amazon MSK and Aurora introduce complexity and potential over-provisioning for this use case. The initial question suggests a need for simplicity and cost efficiency, which is better served by the direct S3 to SageMaker to DynamoDB pipeline.

Explanation

Question 9 Single Choice

A data engineering team is tasked with optimizing the storage of large-scale satellite imagery data, which will be used to train an Amazon SageMaker MXNet image classification algorithm.

Which data format should they use to ensure optimal training performance?

Question 10 Single Choice

An autonomous vehicle technology company is seeking an AWS solution capable of classifying street sign images with minimal latency, handling thousands of images each second. Which AWS services would most effectively fulfill this requirement?

Page: 1 / 34