Databricks Databricks Certified Data Engineer Associate Practice Exams
Last updated on Apr 01,2025- Exam Code: Databricks Certified Data Engineer Associate
- Exam Name: Databricks Certified Data Engineer Associate Exam
- Certification Provider: Databricks
- Latest update: Apr 01,2025
Which of the following describes the relationship between Bronze tables and raw data?
- A . Bronze tables contain less data than raw data files.
- B . Bronze tables contain more truthful data than raw data.
- C . Bronze tables contain aggregates while raw data is unaggregated.
- D . Bronze tables contain a less refined view of data than raw data.
- E . Bronze tables contain raw data with a schema applied.
Which of the following benefits of using the Databricks Lakehouse Platform is provided by Delta Lake?
- A . The ability to manipulate the same data using a variety of languages
- B . The ability to collaborate in real time on a single notebook
- C . The ability to set up alerts for query failures
- D . The ability to support batch and streaming workloads
- E . The ability to distribute complex data operations
Which of the following describes the type of workloads that are always compatible with Auto Loader?
- A . Dashboard workloads
- B . Streaming workloads
- C . Machine learning workloads
- D . Serverless workloads
- E . Batch workloads
In which of the following scenarios should a data engineer use the MERGE INTO command instead of the INSERT INTO command?
- A . When the location of the data needs to be changed
- B . When the target table is an external table
- C . When the source table can be deleted
- D . When the target table cannot contain duplicate records
- E . When the source is not a Delta table
A data engineer has realized that the data files associated with a Delta table are incredibly small.
They want to compact the small files to form larger files to improve performance.
Which of the following keywords can be used to compact the small files?
- A . REDUCE
- B . OPTIMIZE
- C . COMPACTION
- D . REPARTITION
- E . VACUUM
A data engineer is running code in a Databricks Repo that is cloned from a central Git repository. A colleague of the data engineer informs them that changes have been made and synced to the central Git repository. The data engineer now needs to sync their Databricks Repo to get the changes from the central Git repository.
Which of the following Git operations does the data engineer need to run to accomplish this task?
- A . Merge
- B . Push
- C . Pull
- D . Commit
- E . Clone
A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.
They run the following command:
Which of the following lines of code fills in the above blank to successfully complete the task?
- A . org.apache.spark.sql.jdbc
- B . autoloader
- C . DELTA
- D . sqlite
- E . org.apache.spark.sql.sqlite
A data engineer is attempting to drop a Spark SQL table my_table. The data engineer wants to delete all table metadata and data.
They run the following command:
DROP TABLE IF EXISTS my_table
While the object no longer appears when they run SHOW TABLES, the data files still exist.
Which of the following describes why the data files still exist and the metadata files were deleted?
- A . The table’s data was larger than 10 GB
- B . The table’s data was smaller than 10 GB
- C . The table was external
- D . The table did not have a location
- E . The table was managed
A data engineer needs to apply custom logic to identify employees with more than 5 years of experience in array column employees in table stores. The custom logic should create a new column exp_employees that is an array of all of the employees with more than 5 years of experience for each row. In order to apply this custom logic at scale, the data engineer wants to use the FILTER higher-order function.
Which of the following code blocks successfully completes this task?
- A . Option A
- B . Option B
- C . Option C
- D . Option D
- E . Option E
Which of the following is a benefit of the Databricks Lakehouse Platform embracing open source technologies?
- A . Cloud-specific integrations
- B . Simplified governance
- C . Ability to scale storage
- D . Ability to scale workloads
- E . Avoiding vendor lock-in