Openspeech’s configurations

This page describes all configurations in Openspeech.

audio

mfcc

  • name : Name of dataset.

  • sample_rate : Sampling rate of audio

  • frame_length : Frame length for spectrogram

  • frame_shift : Length of hop between STFT

  • del_silence : Flag indication whether to apply delete silence or not

  • num_mels : The number of mfc coefficients to retain.

  • apply_spec_augment : Flag indication whether to apply spec augment or not

  • apply_noise_augment : Flag indication whether to apply noise augment or not

  • apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not

  • apply_joining_augment : Flag indication whether to apply audio joining augment or not

melspectrogram

  • name : Name of dataset.

  • sample_rate : Sampling rate of audio

  • frame_length : Frame length for spectrogram

  • frame_shift : Length of hop between STFT

  • del_silence : Flag indication whether to apply delete silence or not

  • num_mels : The number of mfc coefficients to retain.

  • apply_spec_augment : Flag indication whether to apply spec augment or not

  • apply_noise_augment : Flag indication whether to apply noise augment or not

  • apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not

  • apply_joining_augment : Flag indication whether to apply audio joining augment or not

fbank

  • name : Name of dataset.

  • sample_rate : Sampling rate of audio

  • frame_length : Frame length for spectrogram

  • frame_shift : Length of hop between STFT

  • del_silence : Flag indication whether to apply delete silence or not

  • num_mels : The number of mfc coefficients to retain.

  • apply_spec_augment : Flag indication whether to apply spec augment or not

  • apply_noise_augment : Flag indication whether to apply noise augment or not

  • apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not

  • apply_joining_augment : Flag indication whether to apply audio joining augment or not

spectrogram

  • name : Name of dataset.

  • sample_rate : Sampling rate of audio

  • frame_length : Frame length for spectrogram

  • frame_shift : Length of hop between STFT

  • del_silence : Flag indication whether to apply delete silence or not

  • num_mels : Spectrogram is independent of mel, but uses the ‘num_mels’ variable to unify feature size variables

  • apply_spec_augment : Flag indication whether to apply spec augment or not

  • apply_noise_augment : Flag indication whether to apply noise augment or not

  • apply_time_stretch_augment : Flag indication whether to apply time stretch augment or not

  • apply_joining_augment : Flag indication whether to apply audio joining augment or not

augment

default

  • apply_spec_augment : Flag indication whether to apply spec augment or not

  • apply_noise_augment : Flag indication whether to apply noise augment or not Noise augment requires noise_dataset_path. noise_dataset_dir should be contain audio files.

  • apply_joining_augment : Flag indication whether to apply joining augment or not If true, create a new audio file by connecting two audio randomly

  • apply_time_stretch_augment : Flag indication whether to apply spec augment or not

  • freq_mask_para : Hyper Parameter for freq masking to limit freq masking length

  • freq_mask_num : How many freq-masked area to make

  • time_mask_num : How many time-masked area to make

  • noise_dataset_dir : How many time-masked area to make

  • noise_level : Noise adjustment level

  • time_stretch_min_rate : Minimum rate of audio time stretch

  • time_stretch_max_rate : Maximum rate of audio time stretch

dataset

kspon

  • dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)

  • dataset_path : Path of dataset

  • test_dataset_path : Path of evaluation dataset

  • manifest_file_path : Path of manifest file

  • test_manifest_dir : Path of directory contains test manifest files

  • preprocess_mode : KsponSpeech preprocess mode {phonetic, spelling}

libri

  • dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)

  • dataset_path : Path of dataset

  • dataset_download : Flag indication whether to download dataset or not.

  • manifest_file_path : Path of manifest file

aishell

  • dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)

  • dataset_path : Path of dataset

  • dataset_download : Flag indication whether to download dataset or not.

  • manifest_file_path : Path of manifest file

ksponspeech

  • dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)

  • dataset_path : Path of dataset

  • test_dataset_path : Path of evaluation dataset

  • manifest_file_path : Path of manifest file

  • test_manifest_dir : Path of directory contains test manifest files

  • preprocess_mode : KsponSpeech preprocess mode {phonetic, spelling}

librispeech

  • dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)

  • dataset_path : Path of dataset

  • dataset_download : Flag indication whether to download dataset or not.

  • manifest_file_path : Path of manifest file

lm

  • dataset : Select dataset for training (librispeech, ksponspeech, aishell, lm)

  • dataset_path : Path of dataset

  • valid_ratio : Ratio of validation data

  • test_ratio : Ratio of test data

model

listen_attend_spell

  • model_name : Model name

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • hidden_state_dim : The hidden state dimension of encoder.

  • encoder_dropout_p : The dropout probability of encoder.

  • encoder_bidirectional : If True, becomes a bidirectional encoders

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • max_length : Max decoding length.

  • num_attention_heads : The number of attention heads.

  • decoder_dropout_p : The dropout probability of decoder.

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • optimizer : Optimizer for training.

listen_attend_spell_with_location_aware

  • model_name : Model name

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • hidden_state_dim : The hidden state dimension of encoder.

  • encoder_dropout_p : The dropout probability of encoder.

  • encoder_bidirectional : If True, becomes a bidirectional encoders

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • max_length : Max decoding length.

  • num_attention_heads : The number of attention heads.

  • decoder_dropout_p : The dropout probability of decoder.

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • optimizer : Optimizer for training.

listen_attend_spell_with_multi_head

  • model_name : Model name

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • hidden_state_dim : The hidden state dimension of encoder.

  • encoder_dropout_p : The dropout probability of encoder.

  • encoder_bidirectional : If True, becomes a bidirectional encoders

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • max_length : Max decoding length.

  • num_attention_heads : The number of attention heads.

  • decoder_dropout_p : The dropout probability of decoder.

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • optimizer : Optimizer for training.

joint_ctc_listen_attend_spell

  • model_name : Model name

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • hidden_state_dim : The hidden state dimension of encoder.

  • encoder_dropout_p : The dropout probability of encoder.

  • encoder_bidirectional : If True, becomes a bidirectional encoders

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • max_length : Max decoding length.

  • num_attention_heads : The number of attention heads.

  • decoder_dropout_p : The dropout probability of decoder.

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • optimizer : Optimizer for training.

deep_cnn_with_joint_ctc_listen_attend_spell

  • model_name : Model name

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • hidden_state_dim : The hidden state dimension of encoder.

  • encoder_dropout_p : The dropout probability of encoder.

  • encoder_bidirectional : If True, becomes a bidirectional encoders

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • extractor : The CNN feature extractor.

  • activation : Type of activation function

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • max_length : Max decoding length.

  • num_attention_heads : The number of attention heads.

  • decoder_dropout_p : The dropout probability of decoder.

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • optimizer : Optimizer for training.

deepspeech2

  • model_name : Model name

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • num_rnn_layers : The number of rnn layers

  • rnn_hidden_dim : Hidden state dimenstion of RNN.

  • dropout_p : The dropout probability of model.

  • bidirectional : If True, becomes a bidirectional encoders

  • activation : Type of activation function

  • optimizer : Optimizer for training.

lstm_lm

  • model_name : Model name

  • num_layers : The number of encoder layers.

  • hidden_state_dim : The hidden state dimension of encoder.

  • dropout_p : The dropout probability of encoder.

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • optimizer : Optimizer for training.

rnn_transducer

  • model_name : Model name

  • encoder_hidden_state_dim : Dimension of encoder.

  • decoder_hidden_state_dim : Dimension of decoder.

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • encoder_dropout_p : The dropout probability of encoder.

  • decoder_dropout_p : The dropout probability of decoder.

  • bidirectional : If True, becomes a bidirectional encoders

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • output_dim : Dimension of outputs

  • optimizer : Optimizer for training.

transformer_lm

  • model_name : Model name

  • num_layers : The number of encoder layers.

  • d_model : The dimension of model.

  • d_ff : The dimenstion of feed forward network.

  • num_attention_heads : The number of attention heads.

  • dropout_p : The dropout probability of encoder.

  • max_length : Max decoding length.

  • optimizer : Optimizer for training.

transformer

  • model_name : Model name

  • d_model : Dimension of model.

  • d_ff : Dimenstion of feed forward network.

  • num_attention_heads : The number of attention heads.

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • encoder_dropout_p : The dropout probability of encoder.

  • decoder_dropout_p : The dropout probability of decoder.

  • ffnet_style : Style of feed forward network. (ff, conv)

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • optimizer : Optimizer for training.

joint_ctc_transformer

  • model_name : Model name

  • extractor : The CNN feature extractor.

  • d_model : Dimension of model.

  • d_ff : Dimenstion of feed forward network.

  • num_attention_heads : The number of attention heads.

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • encoder_dropout_p : The dropout probability of encoder.

  • decoder_dropout_p : The dropout probability of decoder.

  • ffnet_style : Style of feed forward network. (ff, conv)

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • optimizer : Optimizer for training.

transformer_with_ctc

  • model_name : Model name

  • d_model : Dimension of model.

  • d_ff : Dimenstion of feed forward network.

  • num_attention_heads : The number of attention heads.

  • num_encoder_layers : The number of encoder layers.

  • encoder_dropout_p : The dropout probability of encoder.

  • ffnet_style : Style of feed forward network. (ff, conv)

  • optimizer : Optimizer for training.

vgg_transformer

  • model_name : Model name

  • extractor : The CNN feature extractor.

  • d_model : Dimension of model.

  • d_ff : Dimenstion of feed forward network.

  • num_attention_heads : The number of attention heads.

  • num_encoder_layers : The number of encoder layers.

  • num_decoder_layers : The number of decoder layers.

  • encoder_dropout_p : The dropout probability of encoder.

  • decoder_dropout_p : The dropout probability of decoder.

  • ffnet_style : Style of feed forward network. (ff, conv)

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • joint_ctc_attention : Flag indication joint ctc attention or not

  • optimizer : Optimizer for training.

conformer

  • model_name : Model name

  • encoder_dim : Dimension of encoder.

  • num_encoder_layers : The number of encoder layers.

  • num_attention_heads : The number of attention heads.

  • feed_forward_expansion_factor : The expansion factor of feed forward module.

  • conv_expansion_factor : The expansion factor of convolution module.

  • input_dropout_p : The dropout probability of inputs.

  • feed_forward_dropout_p : The dropout probability of feed forward module.

  • attention_dropout_p : The dropout probability of attention module.

  • conv_dropout_p : The dropout probability of convolution module.

  • conv_kernel_size : The kernel size of convolution.

  • half_step_residual : Flag indication whether to use half step residual or not

  • optimizer : Optimizer for training.

conformer_lstm

  • model_name : Model name

  • encoder_dim : Dimension of encoder.

  • num_encoder_layers : The number of encoder layers.

  • num_attention_heads : The number of attention heads.

  • feed_forward_expansion_factor : The expansion factor of feed forward module.

  • conv_expansion_factor : The expansion factor of convolution module.

  • input_dropout_p : The dropout probability of inputs.

  • feed_forward_dropout_p : The dropout probability of feed forward module.

  • attention_dropout_p : The dropout probability of attention module.

  • conv_dropout_p : The dropout probability of convolution module.

  • conv_kernel_size : The kernel size of convolution.

  • half_step_residual : Flag indication whether to use half step residual or not

  • num_decoder_layers : The number of decoder layers.

  • decoder_dropout_p : The dropout probability of decoder.

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • optimizer : Optimizer for training.

conformer_transducer

  • model_name : Model name

  • encoder_dim : Dimension of encoder.

  • num_encoder_layers : The number of encoder layers.

  • num_attention_heads : The number of attention heads.

  • feed_forward_expansion_factor : The expansion factor of feed forward module.

  • conv_expansion_factor : The expansion factor of convolution module.

  • input_dropout_p : The dropout probability of inputs.

  • feed_forward_dropout_p : The dropout probability of feed forward module.

  • attention_dropout_p : The dropout probability of attention module.

  • conv_dropout_p : The dropout probability of convolution module.

  • conv_kernel_size : The kernel size of convolution.

  • half_step_residual : Flag indication whether to use half step residual or not

  • num_decoder_layers : The number of decoder layers.

  • decoder_dropout_p : The dropout probability of decoder.

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • decoder_hidden_state_dim : Hidden state dimension of decoder.

  • decoder_output_dim : Output dimension of decoder.

  • optimizer : Optimizer for training.

joint_ctc_conformer_lstm

  • model_name : Model name

  • encoder_dim : Dimension of encoder.

  • num_encoder_layers : The number of encoder layers.

  • num_attention_heads : The number of attention heads.

  • feed_forward_expansion_factor : The expansion factor of feed forward module.

  • conv_expansion_factor : The expansion factor of convolution module.

  • input_dropout_p : The dropout probability of inputs.

  • feed_forward_dropout_p : The dropout probability of feed forward module.

  • attention_dropout_p : The dropout probability of attention module.

  • conv_dropout_p : The dropout probability of convolution module.

  • conv_kernel_size : The kernel size of convolution.

  • half_step_residual : Flag indication whether to use half step residual or not

  • num_decoder_layers : The number of decoder layers.

  • decoder_dropout_p : The dropout probability of decoder.

  • num_decoder_attention_heads : The number of decoder attention heads.

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • optimizer : Optimizer for training.

transformer_transducer

  • model_name : Model name

  • encoder_dim : Dimension of encoder name

  • d_ff : Dimension of feed forward network

  • num_audio_layers : Number of audio layers

  • num_label_layers : Number of label layers

  • num_attention_heads : Number of attention heads

  • audio_dropout_p : Dropout probability of audio layer

  • label_dropout_p : Dropout probability of label layer

  • decoder_hidden_state_dim : Hidden state dimension of decoder

  • decoder_output_dim : Dimension of model output.

  • conv_kernel_size : Kernel size of convolution layer.

  • max_positional_length : Max length of positional encoding.

  • optimizer : Optimizer for training.

quartznet5x5

  • model_name : Model name

  • num_blocks : Number of quartznet blocks

  • num_sub_blocks : Number of quartznet sub blocks

  • in_channels : Input channels of jasper blocks

  • out_channels : Output channels of jasper block’s convolution

  • kernel_size : Kernel size of jasper block’s convolution

  • dilation : Dilation of jasper block’s convolution

  • dropout_p : Dropout probability

  • optimizer : Optimizer for training.

quartznet10x5

  • model_name : Model name

  • num_blocks : Number of quartznet blocks

  • num_sub_blocks : Number of quartznet sub blocks

  • in_channels : Input channels of jasper blocks

  • out_channels : Output channels of jasper block’s convolution

  • kernel_size : Kernel size of jasper block’s convolution

  • dilation : Dilation of jasper block’s convolution

  • dropout_p : Dropout probability

  • optimizer : Optimizer for training.

quartznet15x5

  • model_name : Model name

  • num_blocks : Number of quartznet5x5 blocks

  • num_sub_blocks : Number of quartznet5x5 sub blocks

  • in_channels : Input channels of jasper blocks

  • out_channels : Output channels of jasper block’s convolution

  • kernel_size : Kernel size of jasper block’s convolution

  • dilation : Dilation of jasper block’s convolution

  • dropout_p : Dropout probability

  • optimizer : Optimizer for training.

contextnet

  • model_name : Model name

  • model_size : Model size

  • input_dim : Dimension of input vector

  • num_encoder_layers : The number of convolution layers

  • kernel_size : Value of convolution kernel size

  • num_channels : The number of channels in the convolution filter

  • encoder_dim : Dimension of encoder output vector

  • optimizer : Optimizer for training

contextnet_lstm

  • model_name : Model name

  • model_size : Model size

  • input_dim : Dimension of input vector

  • num_encoder_layers : The number of convolution layers

  • num_decoder_layers : The number of decoder layers.

  • kernel_size : Value of convolution kernel size

  • num_channels : The number of channels in the convolution filter

  • encoder_dim : Dimension of encoder output vector

  • num_attention_heads : The number of attention heads.

  • attention_dropout_p : The dropout probability of attention module.

  • decoder_dropout_p : The dropout probability of decoder.

  • max_length : Max decoding length.

  • teacher_forcing_ratio : The ratio of teacher forcing.

  • rnn_type : Type of rnn cell (rnn, lstm, gru)

  • decoder_attn_mechanism : The attention mechanism for decoder.

  • optimizer : Optimizer for training.

contextnet_transducer

  • model_name : Model name

  • model_size : Model size

  • input_dim : Dimension of input vector

  • num_encoder_layers : The number of convolution layers

  • num_decoder_layers : The number of rnn layers

  • kernel_size : Value of convolution kernel size

  • num_channels : The number of channels in the convolution filter

  • hidden_dim : The number of features in the decoder hidden state

  • encoder_dim : Dimension of encoder output vector

  • decoder_output_dim : Dimension of decoder output vector

  • dropout : Dropout probability of decoder

  • rnn_type : Type of rnn cell

  • optimizer : Optimizer for training

jasper5x3

  • model_name : Model name

  • num_blocks : Number of jasper blocks

  • num_sub_blocks : Number of jasper sub blocks

  • in_channels : Input channels of jasper blocks

  • out_channels : Output channels of jasper block’s convolution

  • kernel_size : Kernel size of jasper block’s convolution

  • dilation : Dilation of jasper block’s convolution

  • dropout_p : Dropout probability

  • optimizer : Optimizer for training.

jasper10x5

  • model_name : Model name

  • num_blocks : Number of jasper blocks

  • num_sub_blocks : Number of jasper sub blocks

  • in_channels : Input channels of jasper blocks

  • out_channels : Output channels of jasper block’s convolution

  • kernel_size : Kernel size of jasper block’s convolution

  • dilation : Dilation of jasper block’s convolution

  • dropout_p : Dropout probability

  • optimizer : Optimizer for training.

criterion

label_smoothed_cross_entropy

  • criterion_name : Criterion name for training.

  • reduction : Reduction method of criterion

  • smoothing : Ratio of smoothing loss (confidence = 1.0 - smoothing)

joint_ctc_cross_entropy

  • criterion_name : Criterion name for training.

  • reduction : Reduction method of criterion

  • ctc_weight : Weight of ctc loss for training.

  • cross_entropy_weight : Weight of cross entropy loss for training.

  • smoothing : Ratio of smoothing loss (confidence = 1.0 - smoothing)

  • zero_infinity : Whether to zero infinite losses and the associated gradients.

perplexity

  • criterion_name : Criterion name for training

  • reduction : Reduction method of criterion

transducer

  • criterion_name : Criterion name for training.

  • reduction : Reduction method of criterion

  • gather : Reduce memory consumption.

ctc

  • criterion_name : Criterion name for training

  • reduction : Reduction method of criterion

  • zero_infinity : Whether to zero infinite losses and the associated gradients.

cross_entropy

  • criterion_name : Criterion name for training

  • reduction : Reduction method of criterion

lr_scheduler

reduce_lr_on_plateau

  • lr : Learning rate

  • scheduler_name : Name of learning rate scheduler.

  • lr_patience : Number of epochs with no improvement after which learning rate will be reduced.

  • lr_factor : Factor by which the learning rate will be reduced. new_lr = lr * factor.

warmup

  • lr : Learning rate

  • scheduler_name : Name of learning rate scheduler.

  • peak_lr : Maximum learning rate.

  • init_lr : Initial learning rate.

  • warmup_steps : Warmup the learning rate linearly for the first N updates

  • total_steps : Total training steps.

warmup_reduce_lr_on_plateau

  • lr : Learning rate

  • scheduler_name : Name of learning rate scheduler.

  • lr_patience : Number of epochs with no improvement after which learning rate will be reduced.

  • lr_factor : Factor by which the learning rate will be reduced. new_lr = lr * factor.

  • peak_lr : Maximum learning rate.

  • init_lr : Initial learning rate.

  • warmup_steps : Warmup the learning rate linearly for the first N updates

tri_stage

  • lr : Learning rate

  • scheduler_name : Name of learning rate scheduler.

  • init_lr : Initial learning rate.

  • init_lr_scale : Initial learning rate scale.

  • final_lr_scale : Final learning rate scale

  • phase_ratio : Automatically sets warmup/hold/decay steps to the ratio specified here from max_updates. the ratios must add up to 1.0

  • total_steps : Total training steps.

transformer

  • lr : Learning rate

  • scheduler_name : Name of learning rate scheduler.

  • peak_lr : Maximum learning rate.

  • final_lr : Final learning rate.

  • final_lr_scale : Final learning rate scale

  • warmup_steps : Warmup the learning rate linearly for the first N updates

  • decay_steps : Steps in decay stages

trainer

cpu

  • seed : Seed for training.

  • accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).

  • accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.

  • num_workers : The number of cpu cores

  • batch_size : Size of batch

  • check_val_every_n_epoch : Check val every n train epochs.

  • gradient_clip_val : 0 means don’t clip.

  • logger : Training logger. {wandb, tensorboard}

  • max_epochs : Stop training once this number of epochs is reached.

  • auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.

  • name : Trainer name

  • device : Training device.

  • use_cuda : If set True, will train with GPU

gpu

  • seed : Seed for training.

  • accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).

  • accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.

  • num_workers : The number of cpu cores

  • batch_size : Size of batch

  • check_val_every_n_epoch : Check val every n train epochs.

  • gradient_clip_val : 0 means don’t clip.

  • logger : Training logger. {wandb, tensorboard}

  • max_epochs : Stop training once this number of epochs is reached.

  • auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.

  • name : Trainer name

  • device : Training device.

  • use_cuda : If set True, will train with GPU

  • auto_select_gpus : If enabled and gpus is an integer, pick available gpus automatically.

tpu

  • seed : Seed for training.

  • accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).

  • accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.

  • num_workers : The number of cpu cores

  • batch_size : Size of batch

  • check_val_every_n_epoch : Check val every n train epochs.

  • gradient_clip_val : 0 means don’t clip.

  • logger : Training logger. {wandb, tensorboard}

  • max_epochs : Stop training once this number of epochs is reached.

  • auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.

  • name : Trainer name

  • device : Training device.

  • use_cuda : If set True, will train with GPU

  • use_tpu : If set True, will train with GPU

  • tpu_cores : Number of TPU cores

gpu-fp16

  • seed : Seed for training.

  • accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).

  • accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.

  • num_workers : The number of cpu cores

  • batch_size : Size of batch

  • check_val_every_n_epoch : Check val every n train epochs.

  • gradient_clip_val : 0 means don’t clip.

  • logger : Training logger. {wandb, tensorboard}

  • max_epochs : Stop training once this number of epochs is reached.

  • auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.

  • name : Trainer name

  • device : Training device.

  • use_cuda : If set True, will train with GPU

  • auto_select_gpus : If enabled and gpus is an integer, pick available gpus automatically.

  • precision : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs.

  • amp_backend : The mixed precision backend to use (“native” or “apex”)

tpu-fp16

  • seed : Seed for training.

  • accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).

  • accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.

  • num_workers : The number of cpu cores

  • batch_size : Size of batch

  • check_val_every_n_epoch : Check val every n train epochs.

  • gradient_clip_val : 0 means don’t clip.

  • logger : Training logger. {wandb, tensorboard}

  • max_epochs : Stop training once this number of epochs is reached.

  • auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.

  • name : Trainer name

  • device : Training device.

  • use_cuda : If set True, will train with GPU

  • use_tpu : If set True, will train with GPU

  • tpu_cores : Number of TPU cores

  • precision : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs.

  • amp_backend : The mixed precision backend to use (“native” or “apex”)

cpu-fp64

  • seed : Seed for training.

  • accelerator : Previously known as distributed_backend (dp, ddp, ddp2, etc…).

  • accumulate_grad_batches : Accumulates grads every k batches or as set up in the dict.

  • num_workers : The number of cpu cores

  • batch_size : Size of batch

  • check_val_every_n_epoch : Check val every n train epochs.

  • gradient_clip_val : 0 means don’t clip.

  • logger : Training logger. {wandb, tensorboard}

  • max_epochs : Stop training once this number of epochs is reached.

  • auto_scale_batch_size : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory.

  • name : Trainer name

  • device : Training device.

  • use_cuda : If set True, will train with GPU

  • precision : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs.

  • amp_backend : The mixed precision backend to use (“native” or “apex”)

tokenizer

libri_subword

  • sos_token : Start of sentence token

  • eos_token : End of sentence token

  • pad_token : Pad token

  • blank_token : Blank token (for CTC training)

  • encoding : Encoding of vocab

  • unit : Unit of vocabulary.

  • vocab_size : Size of vocabulary.

  • vocab_path : Path of vocabulary file.

libri_character

  • sos_token : Start of sentence token

  • eos_token : End of sentence token

  • pad_token : Pad token

  • blank_token : Blank token (for CTC training)

  • encoding : Encoding of vocab

  • unit : Unit of vocabulary.

  • vocab_path : Path of vocabulary file.

aishell_character

  • sos_token : Start of sentence token

  • eos_token : End of sentence token

  • pad_token : Pad token

  • blank_token : Blank token (for CTC training)

  • encoding : Encoding of vocab

  • unit : Unit of vocabulary.

  • vocab_path : Path of vocabulary file.

kspon_subword

  • sos_token : Start of sentence token

  • eos_token : End of sentence token

  • pad_token : Pad token

  • blank_token : Blank token (for CTC training)

  • encoding : Encoding of vocab

  • unit : Unit of vocabulary.

  • sp_model_path : Path of sentencepiece model.

  • vocab_size : Size of vocabulary.

kspon_grapheme

  • sos_token : Start of sentence token

  • eos_token : End of sentence token

  • pad_token : Pad token

  • blank_token : Blank token (for CTC training)

  • encoding : Encoding of vocab

  • unit : Unit of vocabulary.

  • vocab_path : Path of vocabulary file.

kspon_character

  • sos_token : Start of sentence token

  • eos_token : End of sentence token

  • pad_token : Pad token

  • blank_token : Blank token (for CTC training)

  • encoding : Encoding of vocab

  • unit : Unit of vocabulary.

  • vocab_path : Path of vocabulary file.