Summer Sale - Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dpt65

Databricks-Certified-Data-Engineer-Associate Questions and Answers

Question # 6

A data engineer has realized that they made a mistake when making a daily update to a table. They need to use Delta time travel to restore the table to a version that is 3 days old. However, when the data engineer attempts to time travel to the older version, they are unable to restore the data because the data files have been deleted.

Which of the following explains why the data files are no longer present?

A.

The VACUUM command was run on the table

B.

The TIME TRAVEL command was run on the table

C.

The DELETE HISTORY command was run on the table

D.

The OPTIMIZE command was nun on the table

E.

The HISTORY command was run on the table

Full Access
Question # 7

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

A.

DROP

B.

IGNORE

C.

MERGE

D.

APPEND

E.

INSERT

Full Access
Question # 8

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

A.

Checkpointing and Write-ahead Logs

B.

Structured Streaming cannot record the offset range of the data being processed in each trigger.

C.

Replayable Sources and Idempotent Sinks

D.

Write-ahead Logs and Idempotent Sinks

E.

Checkpointing and Idempotent Sinks

Full Access
Question # 9

A data architect has determined that a table of the following format is necessary:

Which of the following code blocks uses SQL DDL commands to create an empty Delta table in the above format regardless of whether a table already exists with this name?

A.

Option A

B.

Option B

C.

Option C

D.

Option D

E.

Option E

Full Access
Question # 10

A data engineer has left the organization. The data team needs to transfer ownership of the data engineer’s Delta tables to a new data engineer. The new data engineer is the lead engineer on the data team.

Assuming the original data engineer no longer has access, which of the following individuals must be the one to transfer ownership of the Delta tables in Data Explorer?

A.

Databricks account representative

B.

This transfer is not possible

C.

Workspace administrator

D.

New lead data engineer

E.

Original data engineer

Full Access
Question # 11

A data engineer runs a statement every day to copy the previous day’s sales into the table transactions. Each day’s sales are in their own file in the location "/transactions/raw".

Today, the data engineer runs the following command to complete this task:

After running the command today, the data engineer notices that the number of records in table transactions has not changed.

Which of the following describes why the statement might not have copied any new records into the table?

A.

The format of the files to be copied were not included with the FORMAT_OPTIONS keyword.

B.

The names of the files to be copied were not included with the FILES keyword.

C.

The previous day’s file has already been copied into the table.

D.

The PARQUET file format does not support COPY INTO.

E.

The COPY INTO statement requires the table to be refreshed to view the copied rows.

Full Access
Question # 12

Which file format is used for storing Delta Lake Table?

A.

Parquet

B.

Delta

C.

SV

D.

JSON

Full Access
Question # 13

Which of the following Git operations must be performed outside of Databricks Repos?

A.

Commit

B.

Pull

C.

Push

D.

Clone

E.

Merge

Full Access
Question # 14

A data engineer needs to create a table in Databricks using data from their organization’s existing SQLite database.

They run the following command:

Which of the following lines of code fills in the above blank to successfully complete the task?

A.

org.apache.spark.sql.jdbc

B.

autoloader

C.

DELTA

D.

sqlite

E.

org.apache.spark.sql.sqlite

Full Access
Question # 15

A data engineer needs to use a Delta table as part of a data pipeline, but they do not know if they have the appropriate permissions.

In which of the following locations can the data engineer review their permissions on the table?

A.

Databricks Filesystem

B.

Jobs

C.

Dashboards

D.

Repos

E.

Data Explorer

Full Access
Question # 16

What is stored in a Databricks customer's cloud account?

A.

Data

B.

Cluster management metadata

C.

Databricks web application

D.

Notebooks

Full Access
Question # 17

Which tool is used by Auto Loader to process data incrementally?

A.

Spark Structured Streaming

B.

Unity Catalog

C.

Checkpointing

D.

Databricks SQL

Full Access
Question # 18

A data analyst has developed a query that runs against Delta table. They want help from the data engineering team to implement a series of tests to ensure the data returned by the query is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following operations could the data engineering team use to run the query and operate with the results in PySpark?

A.

SELECT * FROM sales

B.

spark.delta.table

C.

spark.sql

D.

There is no way to share data between PySpark and SQL.

E.

spark.table

Full Access
Question # 19

A data engineer is maintaining a data pipeline. Upon data ingestion, the data engineer notices that the source data is starting to have a lower level of quality. The data engineer would like to automate the process of monitoring the quality level.

Which of the following tools can the data engineer use to solve this problem?

A.

Unity Catalog

B.

Data Explorer

C.

Delta Lake

D.

Delta Live Tables

E.

Auto Loader

Full Access
Question # 20

Which of the following describes a benefit of creating an external table from Parquet rather than CSV when using a CREATE TABLE AS SELECT statement?

A.

Parquet files can be partitioned

B.

CREATE TABLE AS SELECT statements cannot be used on files

C.

Parquet files have a well-defined schema

D.

Parquet files have the ability to be optimized

E.

Parquet files will become Delta tables

Full Access
Question # 21

A data engineer wants to schedule their Databricks SQL dashboard to refresh every hour, but they only want the associated SQL endpoint to be running when it is necessary. The dashboard has multiple queries on multiple datasets associated with it. The data that feeds the dashboard is automatically processed using a Databricks Job.

Which of the following approaches can the data engineer use to minimize the total running time of the SQL endpoint used in the refresh schedule of their dashboard?

A.

They can turn on the Auto Stop feature for the SQL endpoint.

B.

They can ensure the dashboard's SQL endpoint is not one of the included query's SQL endpoint.

C.

They can reduce the cluster size of the SQL endpoint.

D.

They can ensure the dashboard's SQL endpoint matches each of the queries' SQL endpoints.

E.

They can set up the dashboard's SQL endpoint to be serverless.

Full Access
Question # 22

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to an ELT job. The ELT job has its Databricks SQL query that returns the number of input records containing unexpected NULL values. The data engineer wants their entire team to be notified via a messaging webhook whenever this value reaches 100.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of NULL values reaches 100?

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with a new webhook alert destination.

D.

They can set up an Alert with one-time notifications.

E.

They can set up an Alert without notifications.

Full Access
Question # 23

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

B.

They can turn on the Auto Stop feature for the SQL endpoint.

C.

They can increase the cluster size of the SQL endpoint.

D.

They can turn on the Serverless feature for the SQL endpoint.

E.

They can increase the maximum bound of the SQL endpoint's scaling range

Full Access
Question # 24

Which of the following data lakehouse features results in improved data quality over a traditional data lake?

A.

A data lakehouse provides storage solutions for structured and unstructured data.

B.

A data lakehouse supports ACID-compliant transactions.

C.

A data lakehouse allows the use of SQL queries to examine data.

D.

A data lakehouse stores data in open formats.

E.

A data lakehouse enables machine learning and artificial Intelligence workloads.

Full Access
Question # 25

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

A.

There is no way to identify the owner of the table

B.

Review the Owner field in the table's page in the cloud storage solution

C.

Review the Permissions tab in the table's page in Data Explorer

D.

Review the Owner field in the table’s page in Data Explorer

Full Access
Question # 26

A data analyst has a series of queries in a SQL program. The data analyst wants this program to run every day. They only want the final query in the program to run on Sundays. They ask for help from the data engineering team to complete this task.

Which of the following approaches could be used by the data engineering team to complete this task?

A.

They could submit a feature request with Databricks to add this functionality.

B.

They could wrap the queries using PySpark and use Python’s control flow system to determine when to run the final query.

C.

They could only run the entire program on Sundays.

D.

They could automatically restrict access to the source table in the final query so that it is only accessible on Sundays.

E.

They could redesign the data model to separate the data used in the final query into a new table.

Full Access
Question # 27

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

Full Access
Question # 28

Which of the following describes the relationship between Gold tables and Silver tables?

A.

Gold tables are more likely to contain aggregations than Silver tables.

B.

Gold tables are more likely to contain valuable data than Silver tables.

C.

Gold tables are more likely to contain a less refined view of data than Silver tables.

D.

Gold tables are more likely to contain more data than Silver tables.

E.

Gold tables are more likely to contain truthful data than Silver tables.

Full Access
Question # 29

A data engineer has realized that the data files associated with a Delta table are incredibly small. They want to compact the small files to form larger files to improve performance.

Which of the following keywords can be used to compact the small files?

A.

REDUCE

B.

OPTIMIZE

C.

COMPACTION

D.

REPARTITION

E.

VACUUM

Full Access