Openspeech’s configurations¶

This page describes all configurations in Openspeech.

`audio`¶

`mfcc`¶

name : Name of dataset.
sample_rate : Sampling rate of audio
frame_length : Frame length for spectrogram
frame_shift : Length of hop between STFT
del_silence : Flag indication whether to apply delete silence or not
num_mels : The number of mfc coefficients to retain.
apply_spec_augment : Flag indication whether to apply spec augment or not
apply_noise_augment : Flag indication whether to apply noise augment or not
apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not
apply_joining_augment : Flag indication whether to apply audio joining augment or not

`melspectrogram`¶

name : Name of dataset.
sample_rate : Sampling rate of audio
frame_length : Frame length for spectrogram
frame_shift : Length of hop between STFT
del_silence : Flag indication whether to apply delete silence or not
num_mels : The number of mfc coefficients to retain.
apply_spec_augment : Flag indication whether to apply spec augment or not
apply_noise_augment : Flag indication whether to apply noise augment or not
apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not
apply_joining_augment : Flag indication whether to apply audio joining augment or not

`fbank`¶

name : Name of dataset.
sample_rate : Sampling rate of audio
frame_length : Frame length for spectrogram
frame_shift : Length of hop between STFT
del_silence : Flag indication whether to apply delete silence or not
num_mels : The number of mfc coefficients to retain.
apply_spec_augment : Flag indication whether to apply spec augment or not
apply_noise_augment : Flag indication whether to apply noise augment or not
apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not
apply_joining_augment : Flag indication whether to apply audio joining augment or not

`spectrogram`¶

name : Name of dataset.
sample_rate : Sampling rate of audio
frame_length : Frame length for spectrogram
frame_shift : Length of hop between STFT
del_silence : Flag indication whether to apply delete silence or not
num_mels : Spectrogram is independent of mel, but uses the ‘num_mels’ variable to unify feature size variables
apply_spec_augment : Flag indication whether to apply spec augment or not
apply_noise_augment : Flag indication whether to apply noise augment or not
apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not
apply_joining_augment : Flag indication whether to apply audio joining augment or not

`augment`¶

`default`¶

apply_spec_augment : Flag indication whether to apply spec augment or not
apply_noise_augment : Flag indication whether to apply noise augment or not Noise augment requires noise_dataset_path. noise_dataset_dir should be contain audio files.
apply_joining_augment : Flag indication whether to apply joining augment or not If true, create a new audio file by connecting two audio randomly
apply_time_stretch_augment : Flag indication whether to apply spec augment or not
freq_mask_para : Hyper Parameter for freq masking to limit freq masking length
freq_mask_num : How many freq-masked area to make
time_mask_num : How many time-masked area to make
noise_dataset_dir : How many time-masked area to make
noise_level : Noise adjustment level
time_stretch_min_rate : Minimum rate of audio time stretch
time_stretch_max_rate : Maximum rate of audio time stretch

`dataset`¶

`kspon`¶

dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)
dataset_path : Path of dataset
test_dataset_path : Path of evaluation dataset
manifest_file_path : Path of manifest file
test_manifest_dir : Path of directory contains test manifest files
preprocess_mode : KsponSpeech preprocess mode {phonetic, spelling}

`libri`¶

dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)
dataset_path : Path of dataset
dataset_download : Flag indication whether to download dataset or not.
manifest_file_path : Path of manifest file

`aishell`¶

dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)
dataset_path : Path of dataset
dataset_download : Flag indication whether to download dataset or not.
manifest_file_path : Path of manifest file

`ksponspeech`¶

dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)
dataset_path : Path of dataset
test_dataset_path : Path of evaluation dataset
manifest_file_path : Path of manifest file
test_manifest_dir : Path of directory contains test manifest files
preprocess_mode : KsponSpeech preprocess mode {phonetic, spelling}

`librispeech`¶

dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)
dataset_path : Path of dataset
dataset_download : Flag indication whether to download dataset or not.
manifest_file_path : Path of manifest file

`lm`¶

dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)
dataset_path : Path of dataset
valid_ratio : Ratio of validation data
test_ratio : Ratio of test data

`model`¶

`listen_attend_spell`¶

model_name : Model name
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
hidden_state_dim : The hidden state dimension of encoder.
encoder_dropout_p : The dropout probability of encoder.
encoder_bidirectional : If True, becomes a bidirectional encoders
rnn_type : Type of rnn cell (rnn, lstm, gru)
joint_ctc_attention : Flag indication joint ctc attention or not
max_length : Max decoding length.
num_attention_heads : The number of attention heads.
decoder_dropout_p : The dropout probability of decoder.
decoder_attn_mechanism : The attention mechanism for decoder.
teacher_forcing_ratio : The ratio of teacher forcing.
optimizer : Optimizer for training.

`listen_attend_spell_with_location_aware`¶

model_name : Model name
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
hidden_state_dim : The hidden state dimension of encoder.
encoder_dropout_p : The dropout probability of encoder.
encoder_bidirectional : If True, becomes a bidirectional encoders
rnn_type : Type of rnn cell (rnn, lstm, gru)
joint_ctc_attention : Flag indication joint ctc attention or not
max_length : Max decoding length.
num_attention_heads : The number of attention heads.
decoder_dropout_p : The dropout probability of decoder.
decoder_attn_mechanism : The attention mechanism for decoder.
teacher_forcing_ratio : The ratio of teacher forcing.
optimizer : Optimizer for training.

`listen_attend_spell_with_multi_head`¶

model_name : Model name
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
hidden_state_dim : The hidden state dimension of encoder.
encoder_dropout_p : The dropout probability of encoder.
encoder_bidirectional : If True, becomes a bidirectional encoders
rnn_type : Type of rnn cell (rnn, lstm, gru)
joint_ctc_attention : Flag indication joint ctc attention or not
max_length : Max decoding length.
num_attention_heads : The number of attention heads.
decoder_dropout_p : The dropout probability of decoder.
decoder_attn_mechanism : The attention mechanism for decoder.
teacher_forcing_ratio : The ratio of teacher forcing.
optimizer : Optimizer for training.

`joint_ctc_listen_attend_spell`¶

model_name : Model name
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
hidden_state_dim : The hidden state dimension of encoder.
encoder_dropout_p : The dropout probability of encoder.
encoder_bidirectional : If True, becomes a bidirectional encoders
rnn_type : Type of rnn cell (rnn, lstm, gru)
joint_ctc_attention : Flag indication joint ctc attention or not
max_length : Max decoding length.
num_attention_heads : The number of attention heads.
decoder_dropout_p : The dropout probability of decoder.
decoder_attn_mechanism : The attention mechanism for decoder.
teacher_forcing_ratio : The ratio of teacher forcing.
optimizer : Optimizer for training.

`deep_cnn_with_joint_ctc_listen_attend_spell`¶

model_name : Model name
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
hidden_state_dim : The hidden state dimension of encoder.
encoder_dropout_p : The dropout probability of encoder.
encoder_bidirectional : If True, becomes a bidirectional encoders
rnn_type : Type of rnn cell (rnn, lstm, gru)
extractor : The CNN feature extractor.
activation : Type of activation function
joint_ctc_attention : Flag indication joint ctc attention or not
max_length : Max decoding length.
num_attention_heads : The number of attention heads.
decoder_dropout_p : The dropout probability of decoder.
decoder_attn_mechanism : The attention mechanism for decoder.
teacher_forcing_ratio : The ratio of teacher forcing.
optimizer : Optimizer for training.

`deepspeech2`¶

model_name : Model name
rnn_type : Type of rnn cell (rnn, lstm, gru)
num_rnn_layers : The number of rnn layers
rnn_hidden_dim : Hidden state dimenstion of RNN.
dropout_p : The dropout probability of model.
bidirectional : If True, becomes a bidirectional encoders
activation : Type of activation function
optimizer : Optimizer for training.

`lstm_lm`¶

model_name : Model name
num_layers : The number of encoder layers.
hidden_state_dim : The hidden state dimension of encoder.
dropout_p : The dropout probability of encoder.
rnn_type : Type of rnn cell (rnn, lstm, gru)
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
optimizer : Optimizer for training.

`rnn_transducer`¶

model_name : Model name
encoder_hidden_state_dim : Dimension of encoder.
decoder_hidden_state_dim : Dimension of decoder.
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
encoder_dropout_p : The dropout probability of encoder.
decoder_dropout_p : The dropout probability of decoder.
bidirectional : If True, becomes a bidirectional encoders
rnn_type : Type of rnn cell (rnn, lstm, gru)
output_dim : Dimension of outputs
optimizer : Optimizer for training.

`transformer_lm`¶

model_name : Model name
num_layers : The number of encoder layers.
d_model : The dimension of model.
d_ff : The dimenstion of feed forward network.
num_attention_heads : The number of attention heads.
dropout_p : The dropout probability of encoder.
max_length : Max decoding length.
optimizer : Optimizer for training.

`transformer`¶

model_name : Model name
d_model : Dimension of model.
d_ff : Dimenstion of feed forward network.
num_attention_heads : The number of attention heads.
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
encoder_dropout_p : The dropout probability of encoder.
decoder_dropout_p : The dropout probability of decoder.
ffnet_style : Style of feed forward network. (ff, conv)
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
joint_ctc_attention : Flag indication joint ctc attention or not
optimizer : Optimizer for training.

`joint_ctc_transformer`¶

model_name : Model name
extractor : The CNN feature extractor.
d_model : Dimension of model.
d_ff : Dimenstion of feed forward network.
num_attention_heads : The number of attention heads.
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
encoder_dropout_p : The dropout probability of encoder.
decoder_dropout_p : The dropout probability of decoder.
ffnet_style : Style of feed forward network. (ff, conv)
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
joint_ctc_attention : Flag indication joint ctc attention or not
optimizer : Optimizer for training.

`transformer_with_ctc`¶

model_name : Model name
d_model : Dimension of model.
d_ff : Dimenstion of feed forward network.
num_attention_heads : The number of attention heads.
num_encoder_layers : The number of encoder layers.
encoder_dropout_p : The dropout probability of encoder.
ffnet_style : Style of feed forward network. (ff, conv)
optimizer : Optimizer for training.

`vgg_transformer`¶

model_name : Model name
extractor : The CNN feature extractor.
d_model : Dimension of model.
d_ff : Dimenstion of feed forward network.
num_attention_heads : The number of attention heads.
num_encoder_layers : The number of encoder layers.
num_decoder_layers : The number of decoder layers.
encoder_dropout_p : The dropout probability of encoder.
decoder_dropout_p : The dropout probability of decoder.
ffnet_style : Style of feed forward network. (ff, conv)
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
joint_ctc_attention : Flag indication joint ctc attention or not
optimizer : Optimizer for training.

`conformer`¶

model_name : Model name
encoder_dim : Dimension of encoder.
num_encoder_layers : The number of encoder layers.
num_attention_heads : The number of attention heads.
feed_forward_expansion_factor : The expansion factor of feed forward module.
conv_expansion_factor : The expansion factor of convolution module.
input_dropout_p : The dropout probability of inputs.
feed_forward_dropout_p : The dropout probability of feed forward module.
attention_dropout_p : The dropout probability of attention module.
conv_dropout_p : The dropout probability of convolution module.
conv_kernel_size : The kernel size of convolution.
half_step_residual : Flag indication whether to use half step residual or not
optimizer : Optimizer for training.

`conformer_lstm`¶

model_name : Model name
encoder_dim : Dimension of encoder.
num_encoder_layers : The number of encoder layers.
num_attention_heads : The number of attention heads.
feed_forward_expansion_factor : The expansion factor of feed forward module.
conv_expansion_factor : The expansion factor of convolution module.
input_dropout_p : The dropout probability of inputs.
feed_forward_dropout_p : The dropout probability of feed forward module.
attention_dropout_p : The dropout probability of attention module.
conv_dropout_p : The dropout probability of convolution module.
conv_kernel_size : The kernel size of convolution.
half_step_residual : Flag indication whether to use half step residual or not
num_decoder_layers : The number of decoder layers.
decoder_dropout_p : The dropout probability of decoder.
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
rnn_type : Type of rnn cell (rnn, lstm, gru)
decoder_attn_mechanism : The attention mechanism for decoder.
optimizer : Optimizer for training.

`conformer_transducer`¶

model_name : Model name
encoder_dim : Dimension of encoder.
num_encoder_layers : The number of encoder layers.
num_attention_heads : The number of attention heads.
feed_forward_expansion_factor : The expansion factor of feed forward module.
conv_expansion_factor : The expansion factor of convolution module.
input_dropout_p : The dropout probability of inputs.
feed_forward_dropout_p : The dropout probability of feed forward module.
attention_dropout_p : The dropout probability of attention module.
conv_dropout_p : The dropout probability of convolution module.
conv_kernel_size : The kernel size of convolution.
half_step_residual : Flag indication whether to use half step residual or not
num_decoder_layers : The number of decoder layers.
decoder_dropout_p : The dropout probability of decoder.
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
rnn_type : Type of rnn cell (rnn, lstm, gru)
decoder_hidden_state_dim : Hidden state dimension of decoder.
decoder_output_dim : Output dimension of decoder.
optimizer : Optimizer for training.

`joint_ctc_conformer_lstm`¶

model_name : Model name
encoder_dim : Dimension of encoder.
num_encoder_layers : The number of encoder layers.
num_attention_heads : The number of attention heads.
feed_forward_expansion_factor : The expansion factor of feed forward module.
conv_expansion_factor : The expansion factor of convolution module.
input_dropout_p : The dropout probability of inputs.
feed_forward_dropout_p : The dropout probability of feed forward module.
attention_dropout_p : The dropout probability of attention module.
conv_dropout_p : The dropout probability of convolution module.
conv_kernel_size : The kernel size of convolution.
half_step_residual : Flag indication whether to use half step residual or not
num_decoder_layers : The number of decoder layers.
decoder_dropout_p : The dropout probability of decoder.
num_decoder_attention_heads : The number of decoder attention heads.
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
rnn_type : Type of rnn cell (rnn, lstm, gru)
decoder_attn_mechanism : The attention mechanism for decoder.
optimizer : Optimizer for training.

`transformer_transducer`¶

model_name : Model name
encoder_dim : Dimension of encoder name
d_ff : Dimension of feed forward network
num_audio_layers : Number of audio layers
num_label_layers : Number of label layers
num_attention_heads : Number of attention heads
audio_dropout_p : Dropout probability of audio layer
label_dropout_p : Dropout probability of label layer
decoder_hidden_state_dim : Hidden state dimension of decoder
decoder_output_dim : Dimension of model output.
conv_kernel_size : Kernel size of convolution layer.
max_positional_length : Max length of positional encoding.
optimizer : Optimizer for training.

`quartznet5x5`¶

model_name : Model name
num_blocks : Number of quartznet blocks
num_sub_blocks : Number of quartznet sub blocks
in_channels : Input channels of jasper blocks
out_channels : Output channels of jasper block’s convolution
kernel_size : Kernel size of jasper block’s convolution
dilation : Dilation of jasper block’s convolution
dropout_p : Dropout probability
optimizer : Optimizer for training.

`quartznet10x5`¶

model_name : Model name
num_blocks : Number of quartznet blocks
num_sub_blocks : Number of quartznet sub blocks
in_channels : Input channels of jasper blocks
out_channels : Output channels of jasper block’s convolution
kernel_size : Kernel size of jasper block’s convolution
dilation : Dilation of jasper block’s convolution
dropout_p : Dropout probability
optimizer : Optimizer for training.

`quartznet15x5`¶

model_name : Model name
num_blocks : Number of quartznet5x5 blocks
num_sub_blocks : Number of quartznet5x5 sub blocks
in_channels : Input channels of jasper blocks
out_channels : Output channels of jasper block’s convolution
kernel_size : Kernel size of jasper block’s convolution
dilation : Dilation of jasper block’s convolution
dropout_p : Dropout probability
optimizer : Optimizer for training.

`contextnet`¶

model_name : Model name
model_size : Model size
input_dim : Dimension of input vector
num_encoder_layers : The number of convolution layers
kernel_size : Value of convolution kernel size
num_channels : The number of channels in the convolution filter
encoder_dim : Dimension of encoder output vector
optimizer : Optimizer for training

`contextnet_lstm`¶

model_name : Model name
model_size : Model size
input_dim : Dimension of input vector
num_encoder_layers : The number of convolution layers
num_decoder_layers : The number of decoder layers.
kernel_size : Value of convolution kernel size
num_channels : The number of channels in the convolution filter
encoder_dim : Dimension of encoder output vector
num_attention_heads : The number of attention heads.
attention_dropout_p : The dropout probability of attention module.
decoder_dropout_p : The dropout probability of decoder.
max_length : Max decoding length.
teacher_forcing_ratio : The ratio of teacher forcing.
rnn_type : Type of rnn cell (rnn, lstm, gru)
decoder_attn_mechanism : The attention mechanism for decoder.
optimizer : Optimizer for training.

`contextnet_transducer`¶

model_name : Model name
model_size : Model size
input_dim : Dimension of input vector
num_encoder_layers : The number of convolution layers
num_decoder_layers : The number of rnn layers
kernel_size : Value of convolution kernel size
num_channels : The number of channels in the convolution filter
hidden_dim : The number of features in the decoder hidden state
encoder_dim : Dimension of encoder output vector
decoder_output_dim : Dimension of decoder output vector
dropout : Dropout probability of decoder
rnn_type : Type of rnn cell
optimizer : Optimizer for training

`jasper5x3`¶

model_name : Model name
num_blocks : Number of jasper blocks
num_sub_blocks : Number of jasper sub blocks
in_channels : Input channels of jasper blocks
out_channels : Output channels of jasper block’s convolution
kernel_size : Kernel size of jasper block’s convolution
dilation : Dilation of jasper block’s convolution
dropout_p : Dropout probability
optimizer : Optimizer for training.

`jasper10x5`¶

model_name : Model name
num_blocks : Number of jasper blocks
num_sub_blocks : Number of jasper sub blocks
in_channels : Input channels of jasper blocks
out_channels : Output channels of jasper block’s convolution
kernel_size : Kernel size of jasper block’s convolution
dilation : Dilation of jasper block’s convolution
dropout_p : Dropout probability
optimizer : Optimizer for training.

`criterion`¶

`label_smoothed_cross_entropy`¶

criterion_name : Criterion name for training.
reduction : Reduction method of criterion
smoothing : Ratio of smoothing loss (confidence = 1.0 - smoothing)

`joint_ctc_cross_entropy`¶

criterion_name : Criterion name for training.
reduction : Reduction method of criterion
ctc_weight : Weight of ctc loss for training.
cross_entropy_weight : Weight of cross entropy loss for training.
smoothing : Ratio of smoothing loss (confidence = 1.0 - smoothing)
zero_infinity : Whether to zero infinite losses and the associated gradients.

`perplexity`¶

criterion_name : Criterion name for training
reduction : Reduction method of criterion

`transducer`¶

criterion_name : Criterion name for training.
reduction : Reduction method of criterion
gather : Reduce memory consumption.

`ctc`¶

criterion_name : Criterion name for training
reduction : Reduction method of criterion
zero_infinity : Whether to zero infinite losses and the associated gradients.

`cross_entropy`¶

criterion_name : Criterion name for training
reduction : Reduction method of criterion

`lr_scheduler`¶

`reduce_lr_on_plateau`¶

lr : Learning rate
scheduler_name : Name of learning rate scheduler.
lr_patience : Number of epochs with no improvement after which learning rate will be reduced.
lr_factor : Factor by which the learning rate will be reduced. new_lr = lr * factor.

`warmup`¶

lr : Learning rate
scheduler_name : Name of learning rate scheduler.
peak_lr : Maximum learning rate.
init_lr : Initial learning rate.
warmup_steps : Warmup the learning rate linearly for the first N updates
total_steps : Total training steps.

`warmup_reduce_lr_on_plateau`¶

lr : Learning rate
scheduler_name : Name of learning rate scheduler.
lr_patience : Number of epochs with no improvement after which learning rate will be reduced.
lr_factor : Factor by which the learning rate will be reduced. new_lr = lr * factor.
peak_lr : Maximum learning rate.
init_lr : Initial learning rate.
warmup_steps : Warmup the learning rate linearly for the first N updates

`tri_stage`¶

lr : Learning rate
scheduler_name : Name of learning rate scheduler.
init_lr : Initial learning rate.
init_lr_scale : Initial learning rate scale.
final_lr_scale : Final learning rate scale
phase_ratio : Automatically sets warmup/hold/decay steps to the ratio specified here from max_updates. the ratios must add up to 1.0
total_steps : Total training steps.

`transformer`¶

lr : Learning rate
scheduler_name : Name of learning rate scheduler.
peak_lr : Maximum learning rate.
final_lr : Final learning rate.
final_lr_scale : Final learning rate scale
warmup_steps : Warmup the learning rate linearly for the first N updates
decay_steps : Steps in decay stages

`trainer`¶

`cpu`¶

seed : Seed for training.
accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).
accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.
num_workers : The number of cpu cores
batch_size : Size of batch
check_val_every_n_epoch : Check val every n train epochs.
gradient_clip_val : 0 means don’t clip.
logger : Training logger. {wandb, tensorboard}
max_epochs : Stop training once this number of epochs is reached.
auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.
name : Trainer name
device : Training device.
use_cuda : If set True, will train with GPU

`gpu`¶

seed : Seed for training.
accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).
accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.
num_workers : The number of cpu cores
batch_size : Size of batch
check_val_every_n_epoch : Check val every n train epochs.
gradient_clip_val : 0 means don’t clip.
logger : Training logger. {wandb, tensorboard}
max_epochs : Stop training once this number of epochs is reached.
auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.
name : Trainer name
device : Training device.
use_cuda : If set True, will train with GPU
auto_select_gpus : If enabled and gpus is an integer, pick available gpus automatically.

`tpu`¶

seed : Seed for training.
accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).
accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.
num_workers : The number of cpu cores
batch_size : Size of batch
check_val_every_n_epoch : Check val every n train epochs.
gradient_clip_val : 0 means don’t clip.
logger : Training logger. {wandb, tensorboard}
max_epochs : Stop training once this number of epochs is reached.
auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.
name : Trainer name
device : Training device.
use_cuda : If set True, will train with GPU
use_tpu : If set True, will train with GPU
tpu_cores : Number of TPU cores

`gpu-fp16`¶

seed : Seed for training.
accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).
accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.
num_workers : The number of cpu cores
batch_size : Size of batch
check_val_every_n_epoch : Check val every n train epochs.
gradient_clip_val : 0 means don’t clip.
logger : Training logger. {wandb, tensorboard}
max_epochs : Stop training once this number of epochs is reached.
auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.
name : Trainer name
device : Training device.
use_cuda : If set True, will train with GPU
auto_select_gpus : If enabled and gpus is an integer, pick available gpus automatically.
precision : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs.
amp_backend : The mixed precision backend to use (“native” or “apex”)

`tpu-fp16`¶

seed : Seed for training.
accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).
accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.
num_workers : The number of cpu cores
batch_size : Size of batch
check_val_every_n_epoch : Check val every n train epochs.
gradient_clip_val : 0 means don’t clip.
logger : Training logger. {wandb, tensorboard}
max_epochs : Stop training once this number of epochs is reached.
auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.
name : Trainer name
device : Training device.
use_cuda : If set True, will train with GPU
use_tpu : If set True, will train with GPU
tpu_cores : Number of TPU cores
precision : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs.
amp_backend : The mixed precision backend to use (“native” or “apex”)

`cpu-fp64`¶

seed : Seed for training.
accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).
accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.
num_workers : The number of cpu cores
batch_size : Size of batch
check_val_every_n_epoch : Check val every n train epochs.
gradient_clip_val : 0 means don’t clip.
logger : Training logger. {wandb, tensorboard}
max_epochs : Stop training once this number of epochs is reached.
auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.
name : Trainer name
device : Training device.
use_cuda : If set True, will train with GPU
precision : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs.
amp_backend : The mixed precision backend to use (“native” or “apex”)

`tokenizer`¶

`libri_subword`¶

sos_token : Start of sentence token
eos_token : End of sentence token
pad_token : Pad token
blank_token : Blank token (for CTC training)
encoding : Encoding of vocab
unit : Unit of vocabulary.
vocab_size : Size of vocabulary.
vocab_path : Path of vocabulary file.

`libri_character`¶

sos_token : Start of sentence token
eos_token : End of sentence token
pad_token : Pad token
blank_token : Blank token (for CTC training)
encoding : Encoding of vocab
unit : Unit of vocabulary.
vocab_path : Path of vocabulary file.

`aishell_character`¶

sos_token : Start of sentence token
eos_token : End of sentence token
pad_token : Pad token
blank_token : Blank token (for CTC training)
encoding : Encoding of vocab
unit : Unit of vocabulary.
vocab_path : Path of vocabulary file.

`kspon_subword`¶

sos_token : Start of sentence token
eos_token : End of sentence token
pad_token : Pad token
blank_token : Blank token (for CTC training)
encoding : Encoding of vocab
unit : Unit of vocabulary.
sp_model_path : Path of sentencepiece model.
vocab_size : Size of vocabulary.

`kspon_grapheme`¶

sos_token : Start of sentence token
eos_token : End of sentence token
pad_token : Pad token
blank_token : Blank token (for CTC training)
encoding : Encoding of vocab
unit : Unit of vocabulary.
vocab_path : Path of vocabulary file.

`kspon_character`¶

sos_token : Start of sentence token
eos_token : End of sentence token
pad_token : Pad token
blank_token : Blank token (for CTC training)
encoding : Encoding of vocab
unit : Unit of vocabulary.
vocab_path : Path of vocabulary file.

Openspeech’s configurations¶

audio¶

mfcc¶

melspectrogram¶

fbank¶

spectrogram¶

augment¶

default¶

dataset¶

kspon¶

libri¶

aishell¶

ksponspeech¶

librispeech¶

lm¶

model¶

listen_attend_spell¶

listen_attend_spell_with_location_aware¶

listen_attend_spell_with_multi_head¶

joint_ctc_listen_attend_spell¶

deep_cnn_with_joint_ctc_listen_attend_spell¶

deepspeech2¶

lstm_lm¶

rnn_transducer¶

transformer_lm¶

transformer¶

joint_ctc_transformer¶

transformer_with_ctc¶

vgg_transformer¶

conformer¶

conformer_lstm¶

conformer_transducer¶

joint_ctc_conformer_lstm¶

transformer_transducer¶

quartznet5x5¶

quartznet10x5¶

quartznet15x5¶

contextnet¶

contextnet_lstm¶

contextnet_transducer¶

jasper5x3¶

jasper10x5¶

criterion¶

label_smoothed_cross_entropy¶

joint_ctc_cross_entropy¶

perplexity¶

transducer¶

ctc¶

cross_entropy¶

lr_scheduler¶

reduce_lr_on_plateau¶

warmup¶

warmup_reduce_lr_on_plateau¶

tri_stage¶

transformer¶

trainer¶

cpu¶

gpu¶

tpu¶

gpu-fp16¶

tpu-fp16¶

cpu-fp64¶

tokenizer¶

libri_subword¶

libri_character¶

aishell_character¶

kspon_subword¶

kspon_grapheme¶

kspon_character¶

`audio`¶

`mfcc`¶

`melspectrogram`¶

`fbank`¶

`spectrogram`¶

`augment`¶

`default`¶

`dataset`¶

`kspon`¶

`libri`¶

`aishell`¶

`ksponspeech`¶

`librispeech`¶

`lm`¶

`model`¶

`listen_attend_spell`¶

`listen_attend_spell_with_location_aware`¶

`listen_attend_spell_with_multi_head`¶

`joint_ctc_listen_attend_spell`¶

`deep_cnn_with_joint_ctc_listen_attend_spell`¶

`deepspeech2`¶

`lstm_lm`¶

`rnn_transducer`¶

`transformer_lm`¶

`transformer`¶

`joint_ctc_transformer`¶

`transformer_with_ctc`¶

`vgg_transformer`¶

`conformer`¶

`conformer_lstm`¶

`conformer_transducer`¶

`joint_ctc_conformer_lstm`¶

`transformer_transducer`¶

`quartznet5x5`¶

`quartznet10x5`¶

`quartznet15x5`¶

`contextnet`¶

`contextnet_lstm`¶

`contextnet_transducer`¶

`jasper5x3`¶

`jasper10x5`¶

`criterion`¶

`label_smoothed_cross_entropy`¶

`joint_ctc_cross_entropy`¶

`perplexity`¶

`transducer`¶

`ctc`¶

`cross_entropy`¶

`lr_scheduler`¶

`reduce_lr_on_plateau`¶

`warmup`¶

`warmup_reduce_lr_on_plateau`¶

`tri_stage`¶

`transformer`¶

`trainer`¶

`cpu`¶

`gpu`¶

`tpu`¶

`gpu-fp16`¶

`tpu-fp16`¶

`cpu-fp64`¶

`tokenizer`¶

`libri_subword`¶

`libri_character`¶

`aishell_character`¶

`kspon_subword`¶

`kspon_grapheme`¶

`kspon_character`¶