Databricks Databricks Certified Data Engineer Associate Practice Exams
Last updated on Apr 09,2025- Exam Code: Databricks Certified Data Engineer Associate
- Exam Name: Databricks Certified Data Engineer Associate Exam
- Certification Provider: Databricks
- Latest update: Apr 09,2025
A data engineer has a Job with multiple tasks that runs nightly. Each of the tasks runs slowly because the clusters take a long time to start.
Which of the following actions can the data engineer perform to improve the start up time for the clusters used for the Job?
- A . They can use endpoints available in Databricks SQL
- B . They can use jobs clusters instead of all-purpose clusters
- C . They can configure the clusters to be single-node
- D . They can use clusters that are from a cluster pool
- E . They can configure the clusters to autoscale for larger data sizes
Which of the following describes the relationship between Gold tables and Silver tables?
- A . Gold tables are more likely to contain aggregations than Silver tables.
- B . Gold tables are more likely to contain valuable data than Silver tables.
- C . Gold tables are more likely to contain a less refined view of data than Silver tables.
- D . Gold tables are more likely to contain more data than Silver tables.
- E . Gold tables are more likely to contain truthful data than Silver tables.
A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.
Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?
- A . They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."
- B . They can turn on the Auto Stop feature for the SQL endpoint.
- C . They can increase the cluster size of the SQL endpoint.
- D . They can turn on the Serverless feature for the SQL endpoint.
- E . They can increase the maximum bound of the SQL endpoint’s scaling range
A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when It is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.
Which approach can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?
- A . O They can reduce the cluster size of the SQL endpoint.
- B . Q They can turn on the Auto Stop feature for the SQL endpoint.
- C . O They can set up the dashboard’s SQL endpoint to be serverless.
- D . 0 They can ensure the dashboard’s SQL endpoint matches each of the queries’ SQL endpoints.
What command in Databricks allows you to remove files from a Delta table that are no longer accessible?
- A . DELETE FROM
- B . TRUNCATE
- C . VACUUM
- D . DROP
In which of the following file formats is data from Delta Lake tables primarily stored?
- A . Delta
- B . CSV
- C . Parquet
- D . JSON
- E . A proprietary, optimized format specific to Databricks
Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?
- A . DROP
- B . IGNORE
- C . MERGE
- D . APPEND
- E . INSERT
A data engineer has configured a Structured Streaming job to read from a table, manipulate the data, and then perform a streaming write into a new table.
The code block used by the data engineer is below:
If the data engineer only wants the query to process all of the available data in as many batches as required, which of the following lines of code should the data engineer use to fill in the blank?
- A . processingTime(1)
- B . trigger(availableNow=True)
- C . trigger(parallelBatch=True)
- D . trigger(processingTime="once")
- E . trigger(continuous="once")
Which of the following data workloads will utilize a Gold table as its source?
- A . A job that enriches data by parsing its timestamps into a human-readable format
- B . A job that aggregates uncleaned data to create standard summary statistics
- C . A job that cleans data by removing malformatted records
- D . A job that queries aggregated data designed to feed into a dashboard
- E . A job that ingests raw data from a streaming source into the Lakehouse
A data engineer needs to create a table in Databricks using data from a CSV file at location /path/to/csv.
They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
- A . None of these lines of code are needed to successfully complete the task
- B . USING CSV
- C . FROM CSV
- D . USING DELTA
- E . FROM "path/to/csv"