Professional Data Engineer - Google Cloud Certified Logo
Google Logo

Professional Data Engineer - Google Cloud Certified Exam Questions

658

Total Questions

SEP
2025

Last Updated

1st

1st Try Guaranteed

Expert Verified

Experts Verified

Question 11 Single Choice

At WeatherSense Labs, you're developing a machine learning model to predict rainfall for a given day. Your dataset contains thousands of input features, and you want to explore ways to speed up model training while keeping the impact on accuracy minimal.

What approach should you consider?

Question 12 Single Choice

NovaTech Manufacturing is streaming real-time sensor data from its production floor into Cloud Bigtable. However, the team has observed severe performance issues, particularly with queries used to power real-time dashboards.

To improve query performance, how should the row key design be modified?

Question 13 Single Choice

You’re building a clothing recommendation system at StyleSense AI, which must adapt to changing fashion trends and user preferences over time. You've already implemented a streaming data pipeline that delivers new interaction data to the model as it becomes available.

How should you incorporate this new data into your model training strategy?

Question 14 Single Choice

At DataVista Corp, you've built a critical dashboard in Looker Studio 360 (formerly Google Data Studio) for your large internal team. The dashboard pulls data from BigQuery as its source. However, you've observed that the visualizations are not displaying data generated within the last hour.

What should you do to resolve this and ensure the dashboard shows the most up-to-date data?

Question 15 Single Choice

You have a dataset with two dimensions, X and Y, and each data point is shaded to represent its class. To accurately classify this data using a linear algorithm, you plan to introduce a synthetic feature. What should be the value of that feature?



Question 16 Single Choice

You are migrating a large set of files from a public HTTPS source to a Cloud Storage bucket for NexaData Corp. Access to the files is secured using signed URLs, and you’ve prepared a TSV file listing all the URLs. You initiated the migration using Storage Transfer Service (STS).

The transfer ran successfully for a while but then failed, and job logs show HTTP 403 errors for the remaining files. You verified that nothing changed on the source system. You now need to resolve the issue and resume the migration.

What should you do?

Question 17 Single Choice

Your company's customer_order table in BigQuery contains the order history for 10 million customers, with a table size of 10 PB. You're tasked with creating a dashboard for the support team to view order history. The dashboard includes two filters, country_name and username, both stored as string data types in the BigQuery table. However, applying filters to the dashboard's query results in slow performance.

SELECT date, order, status FROM customer_order

WHERE country = '<country_name>' AND username = '<username>'


How should you redesign the BigQuery table to facilitate faster access?

Question 18 Single Choice

You're utilizing Google BigQuery as your data warehouse. Users have reported that a seemingly simple query runs exceptionally slowly, regardless of when they execute it:

Upon inspecting the query plan, you observe the following output in the Read section of Stage:1:

What is the most probable cause of the delay for this query?

Question 19 Single Choice

You possess an inventory of VM data stored in a BigQuery table called dataset.inventory_vm. To prepare the data for regular reporting in the most cost-effective manner, you need to exclude VM rows with fewer than 8 vCPUs. What action should you take?



Question 20 Single Choice

You are overseeing the data lake infrastructure for DataNova Corp, which is built on BigQuery. The data ingestion pipelines pull messages from Pub/Sub and write the incoming data into BigQuery tables. After rolling out a new version of the ingestion pipelines, you notice that the daily data volume stored in BigQuery has surged by 50%, even though the data volume in Pub/Sub hasn't changed. Only certain BigQuery tables show a doubling in the size of their daily partitions.

How should you investigate and resolve the root cause of this increase?

Page: 2 / 66