Professional Data Engineer - Google Cloud Certified Logo
Google Logo

Professional Data Engineer - Google Cloud Certified Exam Questions

658

Total Questions

SEP
2025

Last Updated

1st

1st Try Guaranteed

Expert Verified

Experts Verified

Question 1 Single Choice

Your company utilizes WILDCARD tables to query data across multiple tables with similar names. However, the SQL statement is currently encountering an error, displayed as:

# Syntax error: Expected end of statement but got "-" at [4:11]

SELECT age

FROM bigquery-data.noaa_gsod.gsod

WHERE

age != 199

AND_TABLE_SUFFIX = '2929'

ORDER BY age DESC


Which table name will enable the SQL statement to function correctly?

Question 2 Single Choice

You are working at RetailNova Corp., managing a BigQuery table that holds millions of rows of sales transactions, partitioned by date. This table is queried frequently—multiple times per minute—by various applications and users.

The queries compute aggregations such as AVG, MAX, and SUM, and they only need data from the past year, although the full historical data must be retained in the base table. The goal is to always return up-to-date results while also minimizing query costs, reducing maintenance overhead, and improving performance.

What is the best approach?

Question 3 Single Choice

At Invex Systems, a proprietary platform sends inventory data every 6 hours to a cloud-based ingestion service. Each transmission includes a payload with multiple fields and a timestamp. If a transmission issue is suspected, the system may re-send the same data.

As a data engineer, how can you efficiently deduplicate the incoming data?

Question 4 Single Choice

At CloudWare Systems, you're using a production-grade Memorystore for Redis (Standard Tier) instance. As part of your disaster recovery (DR) planning, you need to test failover behavior realistically on this instance. The goal is to ensure no data loss during this failover test.

What is the best approach?

Question 5 Single Choice

You're preparing data for your machine learning team to train a model using BigQueryML. The objective is to predict the price per square foot of real estate. The training data includes columns for price and square footage. However, the 'feature1' column contains null values due to missing data. To retain more data points, you aim to replace the nulls with zeros. Which query should you use?

Question 6 Single Choice

You're managing a Dataplex environment for DataSpring Inc., which includes both raw and curated zones. The data engineering team is uploading CSV and JSON files to a Cloud Storage bucket asset within the curated zone. However, these files are not being automatically discovered by Dataplex.

What should you do to ensure Dataplex can automatically discover these files?

Question 7 Multiple Choice

SecureTrust Analytics, your company operating in a tightly regulated industry, must enforce strict access controls to ensure that users only access the minimum data necessary for their responsibilities. You’re using Google BigQuery and need to apply this principle of least privilege effectively.

Which three of the following strategies would help enforce this requirement?

Question 8 Single Choice

Your organization, DataLink Global, follows a multi-cloud strategy by storing data in both Google Cloud Storage and Amazon S3 buckets, with all data residing in US-based regions. You need to allow your teams to query the most up-to-date data from either cloud using BigQuery, but without granting direct access to the Cloud Storage or S3 buckets themselves.

What should you do?

Question 9 Single Choice

You are a data engineer at AutoNova Motors, and you've built a data pipeline using Google Cloud Pub/Sub to capture sensor anomalies from connected vehicles. A push subscription is configured to send these events to a custom HTTPS endpoint that you’ve developed to take immediate action when anomalies are detected.

However, you notice that your HTTPS endpoint is receiving an unusually high number of duplicate messages.
What is the most likely reason for this issue?

Question 10 Single Choice

You are working with Skyline Realty Corp, a large real estate company, and preparing 6 TB of property sales data for a machine learning use case. You plan to use SQL for data transformation and BigQuery ML to build the ML model. The model will be used to generate predictions on unprocessed raw data.

How should you design the workflow to avoid training-serving skew during predictions?

Page: 1 / 66