Transformer Model¶
Transformer Model¶
-
class
openspeech.models.transformer.model.
JointCTCTransformerModel
(configs: omegaconf.dictconfig.DictConfig, tokenizer: openspeech.tokenizers.tokenizer.Tokenizer)[source]¶ A Speech Transformer model. User is able to modify the attributes as needed. The model is based on the paper “Attention Is All You Need”.
- Parameters
configs (DictConfig) – configuration set.
tokenizer (Tokenizer) – tokenizer is in charge of preparing the inputs for a model.
- Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor): The length of input tensor.
(batch)
- Returns
Result of model predictions.
- Return type
outputs (dict)
-
class
openspeech.models.transformer.model.
TransformerModel
(configs: omegaconf.dictconfig.DictConfig, tokenizer: openspeech.tokenizers.tokenizer.Tokenizer)[source]¶ A Speech Transformer model. User is able to modify the attributes as needed. The model is based on the paper “Attention Is All You Need”.
- Parameters
configs (DictConfig) – configuration set.
tokenizer (Tokeizer) – tokenizer is in charge of preparing the inputs for a model.
- Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor): The length of input tensor.
(batch)
- Returns
Result of model predictions.
- Return type
outputs (dict)
-
class
openspeech.models.transformer.model.
TransformerWithCTCModel
(configs: omegaconf.dictconfig.DictConfig, tokenizer: openspeech.tokenizers.tokenizer.Tokenizer)[source]¶ Transformer Encoder Only Model.
- Parameters
configs (DictConfig) – configuration set.
tokenizer (Tokenizer) – tokenizer is in charge of preparing the inputs for a model.
- Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
. input_lengths (torch.LongTensor): The length of input tensor.(batch)
- Returns
Result of model predictions that contains y_hats, logits, output_lengths
- Return type
outputs (dict)
-
test_step
(batch: tuple, batch_idx: int) → collections.OrderedDict[source]¶ Forward propagate a inputs and targets pair for test.
- Inputs:
batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch
- Returns
loss for training
- Return type
loss (torch.Tensor)
-
training_step
(batch: tuple, batch_idx: int) → collections.OrderedDict[source]¶ Forward propagate a inputs and targets pair for training.
- Inputs:
batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch
- Returns
loss for training
- Return type
loss (torch.Tensor)
-
validation_step
(batch: tuple, batch_idx: int) → collections.OrderedDict[source]¶ Forward propagate a inputs and targets pair for validation.
- Inputs:
batch (tuple): A train batch contains inputs, targets, input_lengths, target_lengths batch_idx (int): The index of batch
- Returns
loss for training
- Return type
loss (torch.Tensor)
-
class
openspeech.models.transformer.model.
VGGTransformerModel
(configs: omegaconf.dictconfig.DictConfig, tokenizer: openspeech.tokenizers.tokenizer.Tokenizer)[source]¶ A Speech Transformer model. User is able to modify the attributes as needed. The model is based on the paper “Attention Is All You Need”.
- Parameters
configs (DictConfig) – configuration set.
tokenizer (Tokenizer) – tokenizer is in charge of preparing the inputs for a model.
- Inputs:
inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be a padded FloatTensor of size
(batch, seq_length, dimension)
.input_lengths (torch.LongTensor): The length of input tensor.
(batch)
- Returns
Result of model predictions.
- Return type
outputs (dict)
Transformer Configuration¶
-
class
openspeech.models.transformer.configurations.
JointCTCTransformerConfigs
(model_name: str = 'joint_ctc_transformer', extractor: str = 'conv2d_subsample', d_model: int = 512, d_ff: int = 2048, num_attention_heads: int = 8, num_encoder_layers: int = 12, num_decoder_layers: int = 6, encoder_dropout_p: float = 0.3, decoder_dropout_p: float = 0.3, ffnet_style: str = 'ff', max_length: int = 128, teacher_forcing_ratio: float = 1.0, joint_ctc_attention: bool = True, optimizer: str = 'adam')[source]¶ This is the configuration class to store the configuration of a
JointCTCTransformer
.It is used to initiated an JointCTCTransformer model.
Configuration objects inherit from :class: ~openspeech.dataclass.configs.OpenspeechDataclass.
- Parameters
model_name (str) – Model name (default: joint_ctc_transformer)
extractor (str) – The CNN feature extractor. (default: conv2d_subsample)
d_model (int) – Dimension of model. (default: 512)
d_ff (int) – Dimenstion of feed forward network. (default: 2048)
num_attention_heads (int) – The number of attention heads. (default: 8)
num_encoder_layers (int) – The number of encoder layers. (default: 12)
num_decoder_layers (int) – The number of decoder layers. (default: 6)
encoder_dropout_p (float) – The dropout probability of encoder. (default: 0.3)
decoder_dropout_p (float) – The dropout probability of decoder. (default: 0.3)
ffnet_style (str) – Style of feed forward network. (ff, conv) (default: ff)
max_length (int) – Max decoding length. (default: 128)
teacher_forcing_ratio (float) – The ratio of teacher forcing. (default: 1.0)
joint_ctc_attention (bool) – Flag indication joint ctc attention or not (default: True)
optimizer (str) – Optimizer for training. (default: adam)
-
class
openspeech.models.transformer.configurations.
TransformerConfigs
(model_name: str = 'transformer', d_model: int = 512, d_ff: int = 2048, num_attention_heads: int = 8, num_encoder_layers: int = 12, num_decoder_layers: int = 6, encoder_dropout_p: float = 0.3, decoder_dropout_p: float = 0.3, ffnet_style: str = 'ff', max_length: int = 128, teacher_forcing_ratio: float = 1.0, joint_ctc_attention: bool = False, optimizer: str = 'adam')[source]¶ This is the configuration class to store the configuration of a
Transformer
.It is used to initiated an Transformer model.
Configuration objects inherit from :class: ~openspeech.dataclass.configs.OpenspeechDataclass.
- Parameters
model_name (str) – Model name (default: transformer)
d_model (int) – Dimension of model. (default: 512)
d_ff (int) – Dimenstion of feed forward network. (default: 2048)
num_attention_heads (int) – The number of attention heads. (default: 8)
num_encoder_layers (int) – The number of encoder layers. (default: 12)
num_decoder_layers (int) – The number of decoder layers. (default: 6)
encoder_dropout_p (float) – The dropout probability of encoder. (default: 0.3)
decoder_dropout_p (float) – The dropout probability of decoder. (default: 0.3)
ffnet_style (str) – Style of feed forward network. (ff, conv) (default: ff)
max_length (int) – Max decoding length. (default: 128)
teacher_forcing_ratio (float) – The ratio of teacher forcing. (default: 1.0)
joint_ctc_attention (bool) – Flag indication joint ctc attention or not (default: False)
optimizer (str) – Optimizer for training. (default: adam)
-
class
openspeech.models.transformer.configurations.
TransformerWithCTCConfigs
(model_name: str = 'transformer_with_ctc', d_model: int = 512, d_ff: int = 2048, num_attention_heads: int = 8, num_encoder_layers: int = 12, encoder_dropout_p: float = 0.3, ffnet_style: str = 'ff', optimizer: str = 'adam')[source]¶ This is the configuration class to store the configuration of a
TransformerWithCTC
.It is used to initiated an TransformerWithCTC model.
Configuration objects inherit from :class: ~openspeech.dataclass.configs.OpenspeechDataclass.
- Parameters
model_name (str) – Model name (default: transformer_with_ctc)
extractor (str) – The CNN feature extractor. (default: vgg)
d_model (int) – Dimension of model. (default: 512)
d_ff (int) – Dimenstion of feed forward network. (default: 2048)
num_attention_heads (int) – The number of attention heads. (default: 8)
num_encoder_layers (int) – The number of encoder layers. (default: 12)
encoder_dropout_p (float) – The dropout probability of encoder. (default: 0.3)
ffnet_style (str) – Style of feed forward network. (ff, conv) (default: ff)
optimizer (str) – Optimizer for training. (default: adam)
-
class
openspeech.models.transformer.configurations.
VGGTransformerConfigs
(model_name: str = 'vgg_transformer', extractor: str = 'vgg', d_model: int = 512, d_ff: int = 2048, num_attention_heads: int = 8, num_encoder_layers: int = 12, num_decoder_layers: int = 6, encoder_dropout_p: float = 0.3, decoder_dropout_p: float = 0.3, ffnet_style: str = 'ff', max_length: int = 128, teacher_forcing_ratio: float = 1.0, joint_ctc_attention: bool = False, optimizer: str = 'adam')[source]¶ This is the configuration class to store the configuration of a
VGGTransformer
.It is used to initiated an VGGTransformer model.
Configuration objects inherit from :class: ~openspeech.dataclass.configs.OpenspeechDataclass.
- Parameters
model_name (str) – Model name (default: vgg_transformer)
extractor (str) – The CNN feature extractor. (default: vgg)
d_model (int) – Dimension of model. (default: 512)
d_ff (int) – Dimenstion of feed forward network. (default: 2048)
num_attention_heads (int) – The number of attention heads. (default: 8)
num_encoder_layers (int) – The number of encoder layers. (default: 12)
num_decoder_layers (int) – The number of decoder layers. (default: 6)
encoder_dropout_p (float) – The dropout probability of encoder. (default: 0.3)
decoder_dropout_p (float) – The dropout probability of decoder. (default: 0.3)
ffnet_style (str) – Style of feed forward network. (ff, conv) (default: ff)
max_length (int) – Max decoding length. (default: 128)
teacher_forcing_ratio (float) – The ratio of teacher forcing. (default: 1.0)
joint_ctc_attention (bool) – Flag indication joint ctc attention or not (default: False)
optimizer (str) – Optimizer for training. (default: adam)