Month End Sale Special - 70% Discount Offer - Ends in 0d 00h 00m 00s - Coupon code: 70dumps

DAS-C01 Questions and Answers

Question # 6

A large ride-sharing company has thousands of drivers globally serving millions of unique customers every day. The company has decided to migrate an existing data mart to Amazon Redshift. The existing schema includes the following tables.

A trips fact table for information on completed rides. A drivers dimension table for driver profiles.

A customers fact table holding customer profile information.

The company analyzes trip details by date and destination to examine profitability by region. The drivers data rarely changes. The customers data frequently changes.

What table design provides optimal query performance?

A.

Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers and customers tables.

B.

Use DISTSTYLE EVEN for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.

C.

Use DISTSTYLE KEY (destination) for the trips table and sort by date. Use DISTSTYLE ALL for the drivers table. Use DISTSTYLE EVEN for the customers table.

D.

Use DISTSTYLE EVEN for the drivers table and sort by date. Use DISTSTYLE ALL for both fact tables.

Full Access
Question # 7

A company that produces network devices has millions of users. Data is collected from the devices on an hourly basis and stored in an Amazon S3 data lake.

The company runs analyses on the last 24 hours of data flow logs for abnormality detection and to troubleshoot and resolve user issues. The company also analyzes historical logs dating back 2 years to discover patterns and look for improvement opportunities.

The data flow logs contain many metrics, such as date, timestamp, source IP, and target IP. There are about 10 billion events every day.

How should this data be stored for optimal performance?

A.

In Apache ORC partitioned by date and sorted by source IP

B.

In compressed .csv partitioned by date and sorted by source IP

C.

In Apache Parquet partitioned by source IP and sorted by date

D.

In compressed nested JSON partitioned by source IP and sorted by date

Full Access
Question # 8

A hospital is building a research data lake to ingest data from electronic health records (EHR) systems from multiple hospitals and clinics. The EHR systems are independent of each other and do not have a common patient identifier. The data engineering team is not experienced in machine learning (ML) and has been asked to generate a unique patient identifier for the ingested records.

Which solution will accomplish this task?

A.

An AWS Glue ETL job with the FindMatches transform

B.

Amazon Kendra

C.

Amazon SageMaker Ground Truth

D.

An AWS Glue ETL job with the ResolveChoice transform

Full Access
Question # 9

A data engineering team within a shared workspace company wants to build a centralized logging system for all weblogs generated by the space reservation system. The company has a fleet of Amazon EC2 instances that process requests for shared space reservations on its website. The data engineering team wants to ingest all weblogs into a service that will provide a near-real-time search engine. The team does not want to manage the maintenance and operation of the logging system.

Which solution allows the data engineering team to efficiently set up the web logging system within AWS?

A.

Set up the Amazon CloudWatch agent to stream weblogs to CloudWatch logs and subscribe the Amazon Kinesis data stream to CloudWatch. Choose Amazon Elasticsearch Service as the end destination of the weblogs.

B.

Set up the Amazon CloudWatch agent to stream weblogs to CloudWatch logs and subscribe the Amazon Kinesis Data Firehose delivery stream to CloudWatch. Choose Amazon Elasticsearch Service as the end destination of the weblogs.

C.

Set up the Amazon CloudWatch agent to stream weblogs to CloudWatch logs and subscribe the Amazon Kinesis data stream to CloudWatch. Configure Splunk as the end destination of the weblogs.

D.

Set up the Amazon CloudWatch agent to stream weblogs to CloudWatch logs and subscribe the Amazon Kinesis Firehose delivery stream to CloudWatch. Configure Amazon DynamoDB as the end destination of the weblogs.

Full Access
Question # 10

A large financial company is running its ETL process. Part of this process is to move data from Amazon S3 into an Amazon Redshift cluster. The company wants to use the most cost-efficient method to load the dataset into Amazon Redshift.

Which combination of steps would meet these requirements? (Choose two.)

A.

Use the COPY command with the manifest file to load data into Amazon Redshift.

B.

Use S3DistCp to load files into Amazon Redshift.

C.

Use temporary staging tables during the loading process.

D.

Use the UNLOAD command to upload data into Amazon Redshift.

E.

Use Amazon Redshift Spectrum to query files from Amazon S3.

Full Access
Question # 11

A company currently uses Amazon Athena to query its global datasets. The regional data is stored in Amazon S3 in the us-east-1 and us-west-2 Regions. The data is not encrypted. To simplify the query process and manage it centrally, the company wants to use Athena in us-west-2 to query data from Amazon S3 in both Regions. The solution should be as low-cost as possible.

What should the company do to achieve this goal?

A.

Use AWS DMS to migrate the AWS Glue Data Catalog from us-east-1 to us-west-2. Run Athena queries in us-west-2.

B.

Run the AWS Glue crawler in us-west-2 to catalog datasets in all Regions. Once the data is crawled, run Athena queries in us-west-2.

C.

Enable cross-Region replication for the S3 buckets in us-east-1 to replicate data in us-west-2. Once the data is replicated in us-west-2, run the AWS Glue crawler there to update the AWS Glue Data Catalog in us-west-2 and run Athena queries.

D.

Update AWS Glue resource policies to provide us-east-1 AWS Glue Data Catalog access to us-west-2. Once the catalog in us-west-2 has access to the catalog in us-east-1, run Athena queries in us-west-2.

Full Access
Question # 12

A company uses Amazon Redshift as its data warehouse A new table includes some columns that contain sensitive data and some columns that contain non-sensitive data The data in the table eventually will be referenced by several existing queries that run many times each day

A data analytics specialist must ensure that only members of the company's auditing team can read the columns that contain sensitive data All other users must have read-only access to the columns that contain non-sensitive data

Which solution will meet these requirements with the LEAST operational overhead?

A.

Grant the auditing team permission to read from the table. Load the columns that contain non-sensitive data into a second table. Grant the appropriate users read-only permissions to the second table.

B.

Grant all users read-only permissions to the columns that contain non-sensitive data Use the GRANT SELECT command to allow the auditing team to access the columns that contain sensitive data

C.

Grant all users read-only permissions to the columns that contain non-sensitive data Attach an 1AM policy to the auditing team with an explicit Allow action that grants access to the columns that contain sensitive data

D.

Grant the auditing team permission to read from the table Create a view of the table that includes the columns that contain non-sensitive data Grant the appropriate users read-only permissions to that view

Full Access
Question # 13

A company that monitors weather conditions from remote construction sites is setting up a solution to collect temperature data from the following two weather stations.

  • Station A, which has 10 sensors
  • Station B, which has five sensors

These weather stations were placed by onsite subject-matter experts.

Each sensor has a unique ID. The data collected from each sensor will be collected using Amazon Kinesis Data Streams.

Based on the total incoming and outgoing data throughput, a single Amazon Kinesis data stream with two shards is created. Two partition keys are created based on the station names. During testing, there is a bottleneck on data coming from Station A, but not from Station B. Upon review, it is confirmed that the total stream throughput is still less than the allocated Kinesis Data Streams throughput.

How can this bottleneck be resolved without increasing the overall cost and complexity of the solution, while retaining the data collection quality requirements?

A.

Increase the number of shards in Kinesis Data Streams to increase the level of parallelism.

B.

Create a separate Kinesis data stream for Station A with two shards, and stream Station A sensor data to the new stream.

C.

Modify the partition key to use the sensor ID instead of the station name.

D.

Reduce the number of sensors in Station A from 10 to 5 sensors.

Full Access
Question # 14

A bank is using Amazon Managed Streaming for Apache Kafka (Amazon MSK) to populate real-time data into a data lake The data lake is built on Amazon S3, and data must be accessible from the data lake within 24 hours Different microservices produce messages to different topics in the cluster The cluster is created with 8 TB of Amazon Elastic Block Store (Amazon EBS) storage and a retention period of 7 days

The customer transaction volume has tripled recently and disk monitoring has provided an alert that the cluster is almost out of storage capacity

What should a data analytics specialist do to prevent the cluster from running out of disk space1?

A.

Use the Amazon MSK console to triple the broker storage and restart the cluster

B.

Create an Amazon CloudWatch alarm that monitors the KafkaDataLogsDiskUsed metric Automatically flush the oldest messages when the value of this metric exceeds 85%

C.

Create a custom Amazon MSK configuration Set the log retention hours parameter to 48 Update the cluster with the new configuration file

D.

Triple the number of consumers to ensure that data is consumed as soon as it is added to a topic.

Full Access
Question # 15

A market data company aggregates external data sources to create a detailed view of product consumption in different countries. The company wants to sell this data to external parties through a subscription. To achieve this goal, the company needs to make its data securely available to external parties who are also AWS users.

What should the company do to meet these requirements with the LEAST operational overhead?

A.

Store the data in Amazon S3. Share the data by using presigned URLs for security.

B.

Store the data in Amazon S3. Share the data by using S3 bucket ACLs.

C.

Upload the data to AWS Data Exchange for storage. Share the data by using presigned URLs for security.

D.

Upload the data to AWS Data Exchange for storage. Share the data by using the AWS Data Exchange sharing wizard.

Full Access
Question # 16

A manufacturing company has many loT devices in different facilities across the world The company is using Amazon Kinesis Data Streams to collect the data from the devices

The company's operations team has started to observe many WnteThroughputExceeded exceptions The operations team determines that the reason is the number of records that are being written to certain shards The data contains device ID capture date measurement type, measurement value and facility ID The facility ID is used as the partition key

Which action will resolve this issue?

A.

Change the partition key from facility ID to a randomly generated key

B.

Increase the number of shards

C.

Archive the data on the producers' side

D.

Change the partition key from facility ID to capture date

Full Access
Question # 17

A company with a video streaming website wants to analyze user behavior to make recommendations to users in real time Clickstream data is being sent to Amazon Kinesis Data Streams and reference data is stored in Amazon S3 The company wants a solution that can use standard SQL quenes The solution must also provide a way to look up pre-calculated reference data while making recommendations

Which solution meets these requirements?

A.

Use an AWS Glue Python shell job to process incoming data from Kinesis Data Streams Use the Boto3 library to write data to Amazon Redshift

B.

Use AWS Glue streaming and Scale to process incoming data from Kinesis Data Streams Use the AWS Glue connector to write data to Amazon Redshift

C.

Use Amazon Kinesis Data Analytics to create an in-application table based upon the reference data Process incoming data from Kinesis Data Streams Use a data stream to write results to Amazon Redshift

D.

Use Amazon Kinesis Data Analytics to create an in-application table based upon the reference data Process incoming data from Kinesis Data Streams Use an Amazon Kinesis Data Firehose delivery stream to write results to Amazon Redshift

Full Access
Question # 18

A company is building a data lake and needs to ingest data from a relational database that has time-series data. The company wants to use managed services to accomplish this. The process needs to be scheduled daily and bring incremental data only from the source into Amazon S3.

What is the MOST cost-effective approach to meet these requirements?

A.

Use AWS Glue to connect to the data source using JDBC Drivers. Ingest incremental records only using job bookmarks.

B.

Use AWS Glue to connect to the data source using JDBC Drivers. Store the last updated key in an Amazon DynamoDB table and ingest the data using the updated key as a filter.

C.

Use AWS Glue to connect to the data source using JDBC Drivers and ingest the entire dataset. Use appropriate Apache Spark libraries to compare the dataset, and find the delta.

D.

Use AWS Glue to connect to the data source using JDBC Drivers and ingest the full data. Use AWS DataSync to ensure the delta only is written into Amazon S3.

Full Access
Question # 19

A data analyst runs a large number of data manipulation language (DML) queries by using Amazon Athena with the JDBC driver. Recently, a query failed after It ran for 30 minutes. The query returned the following message

Java.sql.SGLException: Query timeout

The data analyst does not immediately need the query results However, the data analyst needs a long-term solution for this problem

Which solution will meet these requirements?

A.

Split the query into smaller queries to search smaller subsets of data.

B.

In the settings for Athena, adjust the DML query timeout limit

C.

In the Service Quotas console, request an increase for the DML query timeout

D.

Save the tables as compressed .csv files

Full Access
Question # 20

A company uses Amazon Elasticsearch Service (Amazon ES) to store and analyze its website clickstream data. The company ingests 1 TB of data daily using Amazon Kinesis Data Firehose and stores one day’s worth of data in an Amazon ES cluster.

The company has very slow query performance on the Amazon ES index and occasionally sees errors from Kinesis Data Firehose when attempting to write to the index. The Amazon ES cluster has 10 nodes running a single index and 3 dedicated master nodes. Each data node has 1.5 TB of Amazon EBS storage attached and the cluster is configured with 1,000 shards. Occasionally, JVMMemoryPressure errors are found in the cluster logs.

Which solution will improve the performance of Amazon ES?

A.

Increase the memory of the Amazon ES master nodes.

B.

Decrease the number of Amazon ES data nodes.

C.

Decrease the number of Amazon ES shards for the index.

D.

Increase the number of Amazon ES shards for the index.

Full Access
Question # 21

An analytics software as a service (SaaS) provider wants to offer its customers business intelligence

The provider wants to give customers two user role options

• Read-only users for individuals who only need to view dashboards

• Power users for individuals who are allowed to create and share new dashboards with other users

Which QuickSight feature allows the provider to meet these requirements'?

A.

Embedded dashboards

B.

Table calculations

C.

Isolated namespaces

D.

SPICE

Full Access
Question # 22

A data analyst is using Amazon QuickSight for data visualization across multiple datasets generated by applications. Each application stores files within a separate Amazon S3 bucket. AWS Glue Data Catalog is used as a central catalog across all application data in Amazon S3. A new application stores its data within a separate S3 bucket. After updating the catalog to include the new application data source, the data analyst created a new Amazon QuickSight data source from an Amazon Athena table, but the import into SPICE failed.

How should the data analyst resolve the issue?

A.

Edit the permissions for the AWS Glue Data Catalog from within the Amazon QuickSight console.

B.

Edit the permissions for the new S3 bucket from within the Amazon QuickSight console.

C.

Edit the permissions for the AWS Glue Data Catalog from within the AWS Glue console.

D.

Edit the permissions for the new S3 bucket from within the S3 console.

Full Access