Which of the following ModelArts training parameters is used to customize hyperparameters?
Vision transformer (ViT) performs well in image classification tasks. Which of the following is the main advantage of ViT?
A text classification task has only one final output, while a sequence labeling task has an output in each input position.
Which of the following are required for the image object detection algorithm?
-------- is a text representation method based on the bag of words (BoW) model. It decomposes words into subwords and then adds the vector representations of the subwords to obtain word vectors, fully utilizing character N-gram information. (Fill in the blank.)
Mel-frequency cepstral coefficients (MFCCs) take into account human auditory characteristics by first mapping the linear spectrum to the Mel nonlinear spectrum based on auditory perception, and then converting it to the cepstral domain.
Which of the following statements about the functions of the encoder and decoder is true?
In the deep neural network (DNN)–hidden Markov model (HMM), the DNN is mainly used for feature processing, while the HMM is mainly used for sequence modeling.
What are the adjacency relationships between two pixels whose coordinates are (21,13) and (22,12)?
In cases where the bright and dark areas of an image are too extreme, which of the following techniques can be used to improve the image?
In 2017, the Google machine translation team proposed the Transformer in their paperAttention is All You Need. The Transformer consists of an encoder and a(n) --------. (Fill in the blank.)
In natural language processing tasks, word vector evaluation is an important aspect for measuring the performance of a word embedding model. Which of the following statements about word vector evaluation are true?