Databricks Databricks Certified Data Engineer Professional Practice Exams
Last updated on Apr 09,2025- Exam Code: Databricks Certified Data Engineer Professional
- Exam Name: Databricks Certified Data Engineer Professional Exam
- Certification Provider: Databricks
- Latest update: Apr 09,2025
Which distribution does Databricks support for installing custom Python code packages?
- A . sbt
- B . CRAN
- C . CRAM
- D . nom
- E . Wheels
- F . jars
A Delta Lake table in the Lakehouse named customer_parsams is used in churn prediction by the machine learning team. The table contains information about customers derived from a number of upstream sources. Currently, the data engineering team populates this table nightly by overwriting the table with the current valid values derived from upstream data sources.
Immediately after each update succeeds, the data engineer team would like to determine the difference between the new version and the previous of the table.
Given the current implementation, which method can be used?
- A . Parse the Delta Lake transaction log to identify all newly written data files.
- B . Execute DESCRIBE HISTORY customer_churn_params to obtain the full operation metrics for the update, including a log of all records that have been added or modified.
- C . Execute a query to calculate the difference between the new version and the previous version using Delta Lake’s built-in versioning and time travel functionality.
- D . Parse the Spark event logs to identify those rows that were updated, inserted, or deleted.
The data architect has mandated that all tables in the Lakehouse should be configured as external (also known as "unmanaged") Delta Lake tables.
Which approach will ensure that this requirement is met?
- A . When a database is being created, make sure that the LOCATION keyword is used.
- B . When configuring an external data warehouse for all table storage, leverage Databricks for all ELT.
- C . When data is saved to a table, make sure that a full file path is specified alongside the Delta format.
- D . When tables are created, make sure that the EXTERNAL keyword is used in the CREATE TABLE statement.
- E . When the workspace is being configured, make sure that external cloud object storage has been mounted.
A Databricks SQL dashboard has been configured to monitor the total number of records present in a collection of Delta Lake tables using the following query pattern:
SELECT COUNT (*) FROM table –
Which of the following describes how results are generated each time the dashboard is updated?
- A . The total count of rows is calculated by scanning all data files
- B . The total count of rows will be returned from cached results unless REFRESH is run
- C . The total count of records is calculated from the Delta transaction logs
- D . The total count of records is calculated from the parquet file metadata
- E . The total count of records is calculated from the Hive metastore
The DevOps team has configured a production workload as a collection of notebooks scheduled to run daily using the Jobs UI. A new data engineering hire is onboarding to the team and has requested access to one of these notebooks to review the production logic.
What are the maximum notebook permissions that can be granted to the user without allowing accidental changes to production code or data?
- A . Can Manage
- B . Can Edit
- C . No permissions
- D . Can Read
- E . Can Run
Which statement describes Delta Lake Auto Compaction?
- A . An asynchronous job runs after the write completes to detect if files could be further compacted;
if yes, an optimize job is executed toward a default of 1 GB. - B . Before a Jobs cluster terminates, optimize is executed on all tables modified during the most recent job.
- C . Optimized writes use logical partitions instead of directory partitions; because partition boundaries are only represented in metadata, fewer small files are written.
- D . Data is queued in a messaging bus instead of committing data directly to memory; all data is committed from the messaging bus in one batch once the job is complete.
- E . An asynchronous job runs after the write completes to detect if files could be further compacted; if yes, an optimize job is executed toward a default of 128 MB.
The view updates represents an incremental batch of all newly ingested data to be inserted or updated in the customers table.
The following logic is used to process these records.
Which statement describes this implementation?
- A . The customers table is implemented as a Type 3 table; old values are maintained as a new column alongside the current value.
- B . The customers table is implemented as a Type 2 table; old values are maintained but marked as no longer current and new values are inserted.
- C . The customers table is implemented as a Type 0 table; all writes are append only with no changes to existing values.
- D . The customers table is implemented as a Type 1 table; old values are overwritten by new values and no history is maintained.
- E . The customers table is implemented as a Type 2 table; old values are overwritten and new customers are appended.
The data governance team has instituted a requirement that all tables containing Personal Identifiable Information (PH) must be clearly annotated. This includes adding column comments, table comments, and setting the custom table property "contains_pii" = true.
The following SQL DDL statement is executed to create a new table:
Which command allows manual confirmation that these three requirements have been met?
- A . DESCRIBE EXTENDED dev.pii test
- B . DESCRIBE DETAIL dev.pii test
- C . SHOW TBLPROPERTIES dev.pii test
- D . DESCRIBE HISTORY dev.pii test
- E . SHOW TABLES dev
What statement is true regarding the retention of job run history?
- A . It is retained until you export or delete job run logs
- B . It is retained for 30 days, during which time you can deliver job run logs to DBFS or S3
- C . t is retained for 60 days, during which you can export notebook run results to HTML
- D . It is retained for 60 days, after which logs are archived
- E . It is retained for 90 days or until the run-id is re-used through custom run configuration
The Databricks workspace administrator has configured interactive clusters for each of the data engineering groups. To control costs, clusters are set to terminate after 30 minutes of inactivity. Each user should be able to execute workloads against their assigned clusters at any time of the day.
Assuming users have been added to a workspace but not granted any permissions, which of the following describes the minimal permissions a user would need to start and attach to an already configured cluster.
- A . "Can Manage" privileges on the required cluster
- B . Workspace Admin privileges, cluster creation allowed. "Can Attach To" privileges on the required cluster
- C . Cluster creation allowed. "Can Attach To" privileges on the required cluster
- D . "Can Restart" privileges on the required cluster
- E . Cluster creation allowed. "Can Restart" privileges on the required cluster