Summer Sale - Special Limited Time 65% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: dpt65

Databricks-Certified-Data-Engineer-Associate Questions and Answers

Question # 6

A data engineer is attempting to drop a Spark SQL table my_table and runs the following command:

DROP TABLE IF EXISTS my_table;

After running this command, the engineer notices that the data files and metadata files have been deleted from the file system.

Which of the following describes why all of these files were deleted?

A.

The table was managed

B.

The table's data was smaller than 10 GB

C.

The table's data was larger than 10 GB

D.

The table was external

E.

The table did not have a location

Full Access
Question # 7

Which tool is used by Auto Loader to process data incrementally?

A.

Spark Structured Streaming

B.

Unity Catalog

C.

Checkpointing

D.

Databricks SQL

Full Access
Question # 8

A data engineering team has noticed that their Databricks SQL queries are running too slowly when they are submitted to a non-running SQL endpoint. The data engineering team wants this issue to be resolved.

Which of the following approaches can the team use to reduce the time it takes to return results in this scenario?

A.

They can turn on the Serverless feature for the SQL endpoint and change the Spot Instance Policy to "Reliability Optimized."

B.

They can turn on the Auto Stop feature for the SQL endpoint.

C.

They can increase the cluster size of the SQL endpoint.

D.

They can turn on the Serverless feature for the SQL endpoint.

E.

They can increase the maximum bound of the SQL endpoint's scaling range

Full Access
Question # 9

Which of the following describes the storage organization of a Delta table?

A.

Delta tables are stored in a single file that contains data, history, metadata, and other attributes.

B.

Delta tables store their data in a single file and all metadata in a collection of files in a separate location.

C.

Delta tables are stored in a collection of files that contain data, history, metadata, and other attributes.

D.

Delta tables are stored in a collection of files that contain only the data stored within the table.

E.

Delta tables are stored in a single file that contains only the data stored within the table.

Full Access
Question # 10

A data engineer wants to create a relational object by pulling data from two tables. The relational object does not need to be used by other data engineers in other sessions. In order to save on storage costs, the data engineer wants to avoid copying and storing physical data.

Which of the following relational objects should the data engineer create?

A.

Spark SQL Table

B.

View

C.

Database

D.

Temporary view

E.

Delta Table

Full Access
Question # 11

Which of the following benefits is provided by the array functions from Spark SQL?

A.

An ability to work with data in a variety of types at once

B.

An ability to work with data within certain partitions and windows

C.

An ability to work with time-related data in specified intervals

D.

An ability to work with complex, nested data ingested from JSON files

E.

An ability to work with an array of tables for procedural automation

Full Access
Question # 12

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL. The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which of the following changes will need to be made to the pipeline when migrating to Delta Live Tables?

A.

None of these changes will need to be made

B.

The pipeline will need to stop using the medallion-based multi-hop architecture

C.

The pipeline will need to be written entirely in SQL

D.

The pipeline will need to use a batch source in place of a streaming source

E.

The pipeline will need to be written entirely in Python

Full Access
Question # 13

In order for Structured Streaming to reliably track the exact progress of the processing so that it can handle any kind of failure by restarting and/or reprocessing, which of the following two approaches is used by Spark to record the offset range of the data being processed in each trigger?

A.

Checkpointing and Write-ahead Logs

B.

Structured Streaming cannot record the offset range of the data being processed in each trigger.

C.

Replayable Sources and Idempotent Sinks

D.

Write-ahead Logs and Idempotent Sinks

E.

Checkpointing and Idempotent Sinks

Full Access
Question # 14

A data engineer has a single-task Job that runs each morning before they begin working. After identifying an upstream data issue, they need to set up another task to run a new notebook prior to the original task.

Which of the following approaches can the data engineer use to set up the new task?

A.

They can clone the existing task in the existing Job and update it to run the new notebook.

B.

They can create a new task in the existing Job and then add it as a dependency of the original task.

C.

They can create a new task in the existing Job and then add the original task as a dependency of the new task.

D.

They can create a new job from scratch and add both tasks to run concurrently.

E.

They can clone the existing task to a new Job and then edit it to run the new notebook.

Full Access
Question # 15

Identify how the count_if function and the count where x is null can be used

Consider a table random_values with below data.

What would be the output of below query?

select count_if(col > 1) as count_a. count(*) as count_b.count(col1) as count_c from random_values col1

0

1

2

NULL -

2

3

A.

3 6 5

B.

4 6 5

C.

3 6 6

D.

4 6 6

Full Access
Question # 16

A data engineer and data analyst are working together on a data pipeline. The data engineer is working on the raw, bronze, and silver layers of the pipeline using Python, and the data analyst is working on the gold layer of the pipeline using SQL The raw source of the pipeline is a streaming input. They now want to migrate their pipeline to use Delta Live Tables.

Which change will need to be made to the pipeline when migrating to Delta Live Tables?

A.

The pipeline can have different notebook sources in SQL & Python.

B.

The pipeline will need to be written entirely in SQL.

C.

The pipeline will need to be written entirely in Python.

D.

The pipeline will need to use a batch source in place of a streaming source.

Full Access
Question # 17

A data engineer needs access to a table new_uable, but they do not have the correct permissions. They can ask the table owner for permission, but they do not know who the table owner is.

Which approach can be used to identify the owner of new_table?

A.

There is no way to identify the owner of the table

B.

Review the Owner field in the table's page in the cloud storage solution

C.

Review the Permissions tab in the table's page in Data Explorer

D.

Review the Owner field in the table’s page in Data Explorer

Full Access
Question # 18

Identify the impact of ON VIOLATION DROP ROW and ON VIOLATION FAIL UPDATE for a constraint violation.

A data engineer has created an ETL pipeline using Delta Live table to manage their company travel reimbursement detail, they want to ensure that the if the location details has not been provided by the employee, the pipeline needs to be terminated.

How can the scenario be implemented?

A.

CONSTRAINT valid_location EXPECT (location = NULL)

B.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL UPDATE

C.

CONSTRAINT valid_location EXPECT (location != NULL) ON DROP ROW

D.

CONSTRAINT valid_location EXPECT (location != NULL) ON VIOLATION FAIL

Full Access
Question # 19

A Delta Live Table pipeline includes two datasets defined using STREAMING LIVE TABLE. Three datasets are defined against Delta Lake table sources using LIVE TABLE.

The table is configured to run in Development mode using the Continuous Pipeline Mode.

Assuming previously unprocessed data exists and all definitions are valid, what is the expected outcome after clicking Start to update the pipeline?

A.

All datasets will be updated once and the pipeline will shut down. The compute resources will be terminated.

B.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist until the pipeline is shut down.

C.

All datasets will be updated once and the pipeline will persist without any processing. The compute resources will persist but go unused.

D.

All datasets will be updated once and the pipeline will shut down. The compute resources will persist to allow for additional testing.

E.

All datasets will be updated at set intervals until the pipeline is shut down. The compute resources will persist to allow for additional testing.

Full Access
Question # 20

A data analyst has created a Delta table sales that is used by the entire data analysis team. They want help from the data engineering team to implement a series of tests toensure the data is clean. However, the data engineering team uses Python for its tests rather than SQL.

Which of the following commands could the data engineering team use to access sales in PySpark?

A.

SELECT * FROM sales

B.

There is no way to share data between PySpark and SQL.

C.

spark.sql("sales")

D.

spark.delta.table("sales")

E.

spark.table("sales")

Full Access
Question # 21

A data engineer needs to create a table in Databricks using data from their organization's existing SQLite database. They run the following command:

CREATE TABLE jdbc_customer360

USING

OPTIONS (

url "jdbc:sqlite:/customers.db", dbtable "customer360"

)

Which line of code fills in the above blank to successfully complete the task?

A.

autoloader

B.

org.apache.spark.sql.jdbc

C.

sqlite

D.

org.apache.spark.sql.sqlite

Full Access
Question # 22

Which of the following describes a scenario in which a data engineer will want to use a single-node cluster?

A.

When they are working interactively with a small amount of data

B.

When they are running automated reports to be refreshed as quickly as possible

C.

When they are working with SQL within Databricks SQL

D.

When they are concerned about the ability to automatically scale with larger data

E.

When they are manually running reports with a large amount of data

Full Access
Question # 23

A data engineer has created a new database using the following command:

CREATE DATABASE IF NOT EXISTS customer360;

In which of the following locations will the customer360 database be located?

A.

dbfs:/user/hive/database/customer360

B.

dbfs:/user/hive/warehouse

C.

dbfs:/user/hive/customer360

D.

More information is needed to determine the correct response

Full Access
Question # 24

A data engineer has been given a new record of data:

id STRING = 'a1'

rank INTEGER = 6

rating FLOAT = 9.4

Which of the following SQL commands can be used to append the new record to an existing Delta table my_table?

A.

INSERT INTO my_table VALUES ('a1', 6, 9.4)

B.

my_table UNION VALUES ('a1', 6, 9.4)

C.

INSERT VALUES ( 'a1' , 6, 9.4) INTO my_table

D.

UPDATE my_table VALUES ('a1', 6, 9.4)

E.

UPDATE VALUES ('a1', 6, 9.4) my_table

Full Access
Question # 25

The Delta transaction log for the ‘students’ tables is shown using the ‘DESCRIBE HISTORY students’ command. A Data Engineer needs to query the table as it existed before the UPDATE operation listed in the log.

Which command should the Data Engineer use to achieve this? (Choose two.)

A.

SELECT * FROM students@v4

B.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:47.000+00:00’

C.

SELECT * FROM students FROM HISTORY VERSION AS OF 3

D.

SELECT * FROM students VERSION AS OF 5

E.

SELECT * FROM students TIMESTAMP AS OF ‘2024-04-22T 14:32:58.000+00:00’

Full Access
Question # 26

Which type of workloads are compatible with Auto Loader?

A.

Streaming workloads

B.

Machine learning workloads

C.

Serverless workloads

D.

Batch workloads

Full Access
Question # 27

Which two components function in the DB platform architecture’s control plane? (Choose two.)

A.

Virtual Machines

B.

Compute Orchestration

C.

Serverless Compute

D.

Compute

E.

Unity Catalog

Full Access
Question # 28

What is stored in a Databricks customer's cloud account?

A.

Data

B.

Cluster management metadata

C.

Databricks web application

D.

Notebooks

Full Access
Question # 29

Which of the following commands can be used to write data into a Delta table while avoiding the writing of duplicate records?

A.

DROP

B.

IGNORE

C.

MERGE

D.

APPEND

E.

INSERT

Full Access
Question # 30

Which of the following SQL keywords can be used to convert a table from a long format to a wide format?

A.

PIVOT

B.

CONVERT

C.

WHERE

D.

TRANSFORM

E.

SUM

Full Access
Question # 31

In which of the following scenarios should a data engineer select a Task in the Depends On field of a new Databricks Job Task?

A.

When another task needs to be replaced by the new task

B.

When another task needs to fail before the new task begins

C.

When another task has the same dependency libraries as the new task

D.

When another task needs to use as little compute resources as possible

E.

When another task needs to successfully complete before the new task begins

Full Access
Question # 32

A data engineer has been using a Databricks SQL dashboard to monitor the cleanliness of the input data to a data analytics dashboard for a retail use case. The job has a Databricks SQL query that returns the number of store-level records where sales is equal to zero. The data engineer wants their entire team to be notified via a messaging webhook whenever this value is greater than 0.

Which of the following approaches can the data engineer use to notify their entire team via a messaging webhook whenever the number of stores with $0 in sales is greater than zero?

A.

They can set up an Alert with a custom template.

B.

They can set up an Alert with a new email alert destination.

C.

They can set up an Alert with one-time notifications.

D.

They can set up an Alert with a new webhook alert destination.

E.

They can set up an Alert without notifications.

Full Access