Amazon AWS Certified Machine Learning - Specialty
Prev

There are 204 results

Next
#201 (Accuracy: 100% / 4 votes)
A company wants to segment a large group of customers into subgroups based on shared characteristics. The company’s data scientist is planning to use the Amazon SageMaker built-in k-means clustering algorithm for this task. The data scientist needs to determine the optimal number of subgroups (k) to use.

Which data visualization approach will MOST accurately determine the optimal value of k?
  • A. Calculate the principal component analysis (PCA) components. Run the k-means clustering algorithm for a range of k by using only the first two PCA components. For each value of k, create a scatter plot with a different color for each cluster. The optimal value of k is the value where the clusters start to look reasonably separated.
  • B. Calculate the principal component analysis (PCA) components. Create a line plot of the number of components against the explained variance. The optimal value of k is the number of PCA components after which the curve starts decreasing in a linear fashion.
  • C. Create a t-distributed stochastic neighbor embedding (t-SNE) plot for a range of perplexity values. The optimal value of k is the value of perplexity, where the clusters start to look reasonably separated.
  • D. Run the k-means clustering algorithm for a range of k. For each value of k, calculate the sum of squared errors (SSE). Plot a line chart of the SSE for each value of k. The optimal value of k is the point after which the curve starts decreasing in a linear fashion.
#202 (Accuracy: 100% / 4 votes)
A sports analytics company is providing services at a marathon. Each runner in the marathon will have their race ID printed as text on the front of their shirt. The company needs to extract race IDs from images of the runners.

Which solution will meet these requirements with the LEAST operational overhead?
  • A. Use Amazon Rekognition.
  • B. Use a custom convolutional neural network (CNN).
  • C. Use the Amazon SageMaker Object Detection algorithm.
  • D. Use Amazon Lookout for Vision.
#203 (Accuracy: 100% / 4 votes)
A retail company is selling products through a global online marketplace. The company wants to use machine learning (ML) to analyze customer feedback and identify specific areas for improvement. A developer has built a tool that collects customer reviews from the online marketplace and stores them in an Amazon S3 bucket. This process yields a dataset of 40 reviews. A data scientist building the ML models must identify additional sources of data to increase the size of the dataset.
Which data sources should the data scientist use to augment the dataset of reviews? (Choose three.)
  • A. Emails exchanged by customers and the company's customer service agents
  • B. Social media posts containing the name of the company or its products
  • C. A publicly available collection of news articles
  • D. A publicly available collection of customer reviews
  • E. Product sales revenue figures for the company
  • F. Instruction manuals for the company's products
#204 (Accuracy: 100% / 4 votes)
A Machine Learning Specialist is building a prediction model for a large number of features using linear models, such as linear regression and logistic regression.
During exploratory data analysis, the Specialist observes that many features are highly correlated with each other.
This may make the model unstable.
What should be done to reduce the impact of having such a large number of features?
  • A. Perform one-hot encoding on highly correlated features.
  • B. Use matrix multiplication on highly correlated features.
  • C. Create a new feature space using principal component analysis (PCA)
  • D. Apply the Pearson correlation coefficient.