Certified Machine Learning Associate Logo
Databricks Logo

Certified Machine Learning Associate Exam Questions

278

Total Questions

SEP
2025

Last Updated

1st

1st Try Guaranteed

Expert Verified

Experts Verified

Question 1 Single Choice

How would you obtain summary statistics of spark dataframe for comprehensive data analysis?

Question 2 Single Choice

A team is formulating guidelines on when to apply various metrics for evaluating classification models. They need to decide under what circumstances the F1 score should be favored over accuracy. The F1 score formula is given as follows:

F1 = 2 * (precision * recall) / (precision + recall)

What recommendations should the team incorporate into their guidelines?

Question 3 Single Choice

Which of the following is an example of a distributed machine learning framework?

Question 4 Single Choice

A Data Scientist is using a feature store. In one of the feature tables she wants to replace missing values with each respective feature variable's median value.

A colleague suggests that the data scientist is throwing away valuable information by doing this. Which of the following approaches can they take to include as much information as possible in the feature set?

Choose only ONE best answer.

Question 5 Single Choice

In PySpark, _________ library is provided which makes integrating Python with Apache Spark easy.

Question 6 Single Choice

Which of the following describes the relationship between the native spark Dataframe and pandas API on spark Dataframe?

Choose only ONE best answer.

Question 7 Single Choice

Which of the following tools can be used to parallelize the hyperparameters tuning process for single node machine learning models using a Spark cluster?

Choose only ONE best answer.

Question 8 Single Choice

How does Spark ML tackle a linear regression problem for an extraordinarily large dataset?

Which one of the options is correct?

Choose only ONE best answer.

Question 9 Single Choice

Binning is the process of converting numeric data into categorical data by grouping continuous data into discrete bins or intervals.

Question 10 Single Choice

A machine learning engineer attempts to scale an ML pipeline by distributing its single-node model tuning procedure. After broadcasting the entire training data onto each core, each core in the cluster is capable of training one model at once. As the tuning process is still sluggish, the engineer plans to enhance the parallelism from 4 to 8 cores to expedite the process. Unfortunately, the total memory in the cluster can't be

increased.

Under which conditions would elevating the parallelism from 4 to 8 cores accelerate the tuning process?

Choose only ONE best answer.

Page: 1 / 28