Amazon AWS Certified Machine Learning - Specialty
Prev

There are 204 results

Next
#61 (Accuracy: 100% / 4 votes)
An insurance company developed a new experimental machine learning (ML) model to replace an existing model that is in production. The company must validate the quality of predictions from the new experimental model in a production environment before the company uses the new experimental model to serve general user requests.

New one model can serve user requests at a time.
The company must measure the performance of the new experimental model without affecting the current live traffic.

Which solution will meet these requirements?
  • A. A/B testing
  • B. Canary release
  • C. Shadow deployment
  • D. Blue/green deployment
#62 (Accuracy: 100% / 4 votes)
A machine learning (ML) specialist is using the Amazon SageMaker DeepAR forecasting algorithm to train a model on CPU-based Amazon EC2 On-Demand instances. The model currently takes multiple hours to train. The ML specialist wants to decrease the training time of the model.

Which approaches will meet this requirement? (Choose two.)
  • A. Replace On-Demand Instances with Spot Instances.
  • B. Configure model auto scaling dynamically to adjust the number of instances automatically.
  • C. Replace CPU-based EC2 instances with GPU-based EC2 instances.
  • D. Use multiple training instances.
  • E. Use a pre-trained version of the model. Run incremental training.
#63 (Accuracy: 100% / 3 votes)
An online delivery company wants to choose the fastest courier for each delivery at the moment an order is placed. The company wants to implement this feature for existing users and new users of its application. Data scientists have trained separate models with XGBoost for this purpose, and the models are stored in Amazon S3. There is one model for each city where the company operates.

Operation engineers are hosting these models in Amazon EC2 for responding to the web client requests, with one instance for each model, but the instances have only a 5% utilization in CPU and memory.
The operation engineers want to avoid managing unnecessary resources.

Which solution will enable the company to achieve its goal with the LEAST operational overhead?
  • A. Create an Amazon SageMaker notebook instance for pulling all the models from Amazon S3 using the boto3 library. Remove the existing instances and use the notebook to perform a SageMaker batch transform for performing inferences offline for all the possible users in all the cities. Store the results in different files in Amazon S3. Point the web client to the files.
  • B. Prepare an Amazon SageMaker Docker container based on the open-source multi-model server. Remove the existing instances and create a multi-model endpoint in SageMaker instead, pointing to the S3 bucket containing all the models. Invoke the endpoint from the web client at runtime, specifying the TargetModel parameter according to the city of each request.
  • C. Keep only a single EC2 instance for hosting all the models. Install a model server in the instance and load each model by pulling it from Amazon S3. Integrate the instance with the web client using Amazon API Gateway for responding to the requests in real time, specifying the target resource according to the city of each request.
  • D. Prepare a Docker container based on the prebuilt images in Amazon SageMaker. Replace the existing instances with separate SageMaker endpoints, one for each city where the company operates. Invoke the endpoints from the web client, specifying the URL and EndpointName parameter according to the city of each request.
#64 (Accuracy: 100% / 3 votes)
A machine learning engineer is building a bird classification model. The engineer randomly separates a dataset into a training dataset and a validation dataset. During the training phase, the model achieves very high accuracy. However, the model did not generalize well during validation of the validation dataset. The engineer realizes that the original dataset was imbalanced.

What should the engineer do to improve the validation accuracy of the model?
  • A. Perform stratified sampling on the original dataset.
  • B. Acquire additional data about the majority classes in the original dataset.
  • C. Use a smaller, randomly sampled version of the training dataset.
  • D. Perform systematic sampling on the original dataset.
#65 (Accuracy: 100% / 4 votes)
A machine learning (ML) specialist needs to solve a binary classification problem for a marketing dataset. The ML specialist must maximize the Area Under the ROC Curve (AUC) of the algorithm by training an XGBoost algorithm. The ML specialist must find values for the eta, alpha, min_child_weight, and max_depth hyperparameters that will generate the most accurate model.

Which approach will meet these requirements with the LEAST operational overhead?
  • A. Use a bootstrap script to install scikit-learn on an Amazon EMR cluster. Deploy the EMR cluster. Apply k-fold cross-validation methods to the algorithm.
  • B. Deploy Amazon SageMaker prebuilt Docker images that have scikit-learn installed. Apply k-fold cross-validation methods to the algorithm.
  • C. Use Amazon SageMaker automatic model tuning (AMT). Specify a range of values for each hyperparameter.
  • D. Subscribe to an AUC algorithm that is on AWS Marketplace. Specify a range of values for each hyperparameter.
#66 (Accuracy: 100% / 2 votes)
A machine learning (ML) developer for an online retailer recently uploaded a sales dataset into Amazon SageMaker Studio. The ML developer wants to obtain importance scores for each feature of the dataset. The ML developer will use the importance scores to feature engineer the dataset.

Which solution will meet this requirement with the LEAST development effort?
  • A. Use SageMaker Data Wrangler to perform a Gini importance score analysis.
  • B. Use a SageMaker notebook instance to perform principal component analysis (PCA).
  • C. Use a SageMaker notebook instance to perform a singular value decomposition analysis.
  • D. Use the multicollinearity feature to perform a lasso feature selection to perform an importance scores analysis.
#67 (Accuracy: 100% / 2 votes)
A company is setting up a mechanism for data scientists and engineers from different departments to access an Amazon SageMaker Studio domain. Each department has a unique SageMaker Studio domain.

The company wants to build a central proxy application that data scientists and engineers can log in to by using their corporate credentials.
The proxy application will authenticate users by using the company's existing Identity provider (IdP). The application will then route users to the appropriate SageMaker Studio domain.

The company plans to maintain a table in Amazon DynamoDB that contains SageMaker domains for each department.


How should the company meet these requirements?
  • A. Use the SageMaker CreatePresignedDomainUrl API to generate a presigned URL for each domain according to the DynamoDB table. Pass the presigned URL to the proxy application.
  • B. Use the SageMaker CreateHumanTaskUi API to generate a UI URL. Pass the URL to the proxy application.
  • C. Use the Amazon SageMaker ListHumanTaskUis API to list all UI URLs. Pass the appropriate URL to the DynamoDB table so that the proxy application can use the URL.
  • D. Use the SageMaker CreatePresignedNotebooklnstanceUrl API to generate a presigned URL. Pass the presigned URL to the proxy application.
#68 (Accuracy: 100% / 3 votes)
An insurance company is creating an application to automate car insurance claims. A machine learning (ML) specialist used an Amazon SageMaker Object Detection - TensorFlow built-in algorithm to train a model to detect scratches and dents in images of cars. After the model was trained, the ML specialist noticed that the model performed better on the training dataset than on the testing dataset.

Which approach should the ML specialist use to improve the performance of the model on the testing data?
  • A. Increase the value of the momentum hyperparameter.
  • B. Reduce the value of the dropout_rate hyperparameter.
  • C. Reduce the value of the learning_rate hyperparameter
  • D. Increase the value of the L2 hyperparameter.
#69 (Accuracy: 100% / 2 votes)
A law firm handles thousands of contracts every day. Every contract must be signed. Currently, a lawyer manually checks all contracts for signatures.

The law firm is developing a machine learning (ML) solution to automate signature detection for each contract.
The ML solution must also provide a confidence score for each contract page.

Which Amazon Textract API action can the law firm use to generate a confidence score for each page of each contract?
  • A. Use the AnalyzeDocument API action. Set the FeatureTypes parameter to SIGNATURES. Return the confidence scores for each page.
  • B. Use the Prediction API call on the documents. Return the signatures and confidence scores for each page.
  • C. Use the StartDocumentAnalysis API action to detect the signatures. Return the confidence scores for each page.
  • D. Use the GetDocumentAnalysis API action to detect the signatures. Return the confidence scores for each page.
#70 (Accuracy: 95% / 14 votes)
A Machine Learning Specialist is working with a large cybersecurity company that manages security events in real time for companies around the world. The cybersecurity company wants to design a solution that will allow it to use machine learning to score malicious events as anomalies on the data as it is being ingested. The company also wants be able to save the results in its data lake for later processing and analysis.
What is the MOST efficient way to accomplish these tasks?
  • A. Ingest the data using Amazon Kinesis Data Firehose, and use Amazon Kinesis Data Analytics Random Cut Forest (RCF) for anomaly detection. Then use Kinesis Data Firehose to stream the results to Amazon S3.
  • B. Ingest the data into Apache Spark Streaming using Amazon EMR, and use Spark MLlib with k-means to perform anomaly detection. Then store the results in an Apache Hadoop Distributed File System (HDFS) using Amazon EMR with a replication factor of three as the data lake.
  • C. Ingest the data and store it in Amazon S3. Use AWS Batch along with the AWS Deep Learning AMIs to train a k-means model using TensorFlow on the data in Amazon S3.
  • D. Ingest the data and store it in Amazon S3. Have an AWS Glue job that is triggered on demand transform the new data. Then use the built-in Random Cut Forest (RCF) model within Amazon SageMaker to detect anomalies in the data.