# Openspeech's configurations This page describes all configurations in `Openspeech`. ## `audio` ### `mfcc` - `name` : Name of dataset. - `sample_rate` : Sampling rate of audio - `frame_length` : Frame length for spectrogram - `frame_shift` : Length of hop between STFT - `del_silence` : Flag indication whether to apply delete silence or not - `num_mels` : The number of mfc coefficients to retain. - `apply_spec_augment` : Flag indication whether to apply spec augment or not - `apply_noise_augment` : Flag indication whether to apply noise augment or not - `apply_time_stretch_augment` : Flag indication whether to apply time stretch augment or not - `apply_joining_augment` : Flag indication whether to apply audio joining augment or not ### `melspectrogram` - `name` : Name of dataset. - `sample_rate` : Sampling rate of audio - `frame_length` : Frame length for spectrogram - `frame_shift` : Length of hop between STFT - `del_silence` : Flag indication whether to apply delete silence or not - `num_mels` : The number of mfc coefficients to retain. - `apply_spec_augment` : Flag indication whether to apply spec augment or not - `apply_noise_augment` : Flag indication whether to apply noise augment or not - `apply_time_stretch_augment` : Flag indication whether to apply time stretch augment or not - `apply_joining_augment` : Flag indication whether to apply audio joining augment or not ### `fbank` - `name` : Name of dataset. - `sample_rate` : Sampling rate of audio - `frame_length` : Frame length for spectrogram - `frame_shift` : Length of hop between STFT - `del_silence` : Flag indication whether to apply delete silence or not - `num_mels` : The number of mfc coefficients to retain. - `apply_spec_augment` : Flag indication whether to apply spec augment or not - `apply_noise_augment` : Flag indication whether to apply noise augment or not - `apply_time_stretch_augment` : Flag indication whether to apply time stretch augment or not - `apply_joining_augment` : Flag indication whether to apply audio joining augment or not ### `spectrogram` - `name` : Name of dataset. - `sample_rate` : Sampling rate of audio - `frame_length` : Frame length for spectrogram - `frame_shift` : Length of hop between STFT - `del_silence` : Flag indication whether to apply delete silence or not - `num_mels` : Spectrogram is independent of mel, but uses the 'num_mels' variable to unify feature size variables - `apply_spec_augment` : Flag indication whether to apply spec augment or not - `apply_noise_augment` : Flag indication whether to apply noise augment or not - `apply_time_stretch_augment` : Flag indication whether to apply time stretch augment or not - `apply_joining_augment` : Flag indication whether to apply audio joining augment or not ## `augment` ### `default` - `apply_spec_augment` : Flag indication whether to apply spec augment or not - `apply_noise_augment` : Flag indication whether to apply noise augment or not Noise augment requires `noise_dataset_path`. `noise_dataset_dir` should be contain audio files. - `apply_joining_augment` : Flag indication whether to apply joining augment or not If true, create a new audio file by connecting two audio randomly - `apply_time_stretch_augment` : Flag indication whether to apply spec augment or not - `freq_mask_para` : Hyper Parameter for freq masking to limit freq masking length - `freq_mask_num` : How many freq-masked area to make - `time_mask_num` : How many time-masked area to make - `noise_dataset_dir` : How many time-masked area to make - `noise_level` : Noise adjustment level - `time_stretch_min_rate` : Minimum rate of audio time stretch - `time_stretch_max_rate` : Maximum rate of audio time stretch ## `dataset` ### `kspon` - `dataset` : Select dataset for training (librispeech, ksponspeech, aishell, lm) - `dataset_path` : Path of dataset - `test_dataset_path` : Path of evaluation dataset - `manifest_file_path` : Path of manifest file - `test_manifest_dir` : Path of directory contains test manifest files - `preprocess_mode` : KsponSpeech preprocess mode {phonetic, spelling} ### `libri` - `dataset` : Select dataset for training (librispeech, ksponspeech, aishell, lm) - `dataset_path` : Path of dataset - `dataset_download` : Flag indication whether to download dataset or not. - `manifest_file_path` : Path of manifest file ### `aishell` - `dataset` : Select dataset for training (librispeech, ksponspeech, aishell, lm) - `dataset_path` : Path of dataset - `dataset_download` : Flag indication whether to download dataset or not. - `manifest_file_path` : Path of manifest file ### `ksponspeech` - `dataset` : Select dataset for training (librispeech, ksponspeech, aishell, lm) - `dataset_path` : Path of dataset - `test_dataset_path` : Path of evaluation dataset - `manifest_file_path` : Path of manifest file - `test_manifest_dir` : Path of directory contains test manifest files - `preprocess_mode` : KsponSpeech preprocess mode {phonetic, spelling} ### `librispeech` - `dataset` : Select dataset for training (librispeech, ksponspeech, aishell, lm) - `dataset_path` : Path of dataset - `dataset_download` : Flag indication whether to download dataset or not. - `manifest_file_path` : Path of manifest file ### `lm` - `dataset` : Select dataset for training (librispeech, ksponspeech, aishell, lm) - `dataset_path` : Path of dataset - `valid_ratio` : Ratio of validation data - `test_ratio` : Ratio of test data ## `model` ### `listen_attend_spell` - `model_name` : Model name - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `hidden_state_dim` : The hidden state dimension of encoder. - `encoder_dropout_p` : The dropout probability of encoder. - `encoder_bidirectional` : If True, becomes a bidirectional encoders - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `joint_ctc_attention` : Flag indication joint ctc attention or not - `max_length` : Max decoding length. - `num_attention_heads` : The number of attention heads. - `decoder_dropout_p` : The dropout probability of decoder. - `decoder_attn_mechanism` : The attention mechanism for decoder. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `optimizer` : Optimizer for training. ### `listen_attend_spell_with_location_aware` - `model_name` : Model name - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `hidden_state_dim` : The hidden state dimension of encoder. - `encoder_dropout_p` : The dropout probability of encoder. - `encoder_bidirectional` : If True, becomes a bidirectional encoders - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `joint_ctc_attention` : Flag indication joint ctc attention or not - `max_length` : Max decoding length. - `num_attention_heads` : The number of attention heads. - `decoder_dropout_p` : The dropout probability of decoder. - `decoder_attn_mechanism` : The attention mechanism for decoder. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `optimizer` : Optimizer for training. ### `listen_attend_spell_with_multi_head` - `model_name` : Model name - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `hidden_state_dim` : The hidden state dimension of encoder. - `encoder_dropout_p` : The dropout probability of encoder. - `encoder_bidirectional` : If True, becomes a bidirectional encoders - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `joint_ctc_attention` : Flag indication joint ctc attention or not - `max_length` : Max decoding length. - `num_attention_heads` : The number of attention heads. - `decoder_dropout_p` : The dropout probability of decoder. - `decoder_attn_mechanism` : The attention mechanism for decoder. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `optimizer` : Optimizer for training. ### `joint_ctc_listen_attend_spell` - `model_name` : Model name - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `hidden_state_dim` : The hidden state dimension of encoder. - `encoder_dropout_p` : The dropout probability of encoder. - `encoder_bidirectional` : If True, becomes a bidirectional encoders - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `joint_ctc_attention` : Flag indication joint ctc attention or not - `max_length` : Max decoding length. - `num_attention_heads` : The number of attention heads. - `decoder_dropout_p` : The dropout probability of decoder. - `decoder_attn_mechanism` : The attention mechanism for decoder. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `optimizer` : Optimizer for training. ### `deep_cnn_with_joint_ctc_listen_attend_spell` - `model_name` : Model name - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `hidden_state_dim` : The hidden state dimension of encoder. - `encoder_dropout_p` : The dropout probability of encoder. - `encoder_bidirectional` : If True, becomes a bidirectional encoders - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `extractor` : The CNN feature extractor. - `activation` : Type of activation function - `joint_ctc_attention` : Flag indication joint ctc attention or not - `max_length` : Max decoding length. - `num_attention_heads` : The number of attention heads. - `decoder_dropout_p` : The dropout probability of decoder. - `decoder_attn_mechanism` : The attention mechanism for decoder. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `optimizer` : Optimizer for training. ### `deepspeech2` - `model_name` : Model name - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `num_rnn_layers` : The number of rnn layers - `rnn_hidden_dim` : Hidden state dimenstion of RNN. - `dropout_p` : The dropout probability of model. - `bidirectional` : If True, becomes a bidirectional encoders - `activation` : Type of activation function - `optimizer` : Optimizer for training. ### `lstm_lm` - `model_name` : Model name - `num_layers` : The number of encoder layers. - `hidden_state_dim` : The hidden state dimension of encoder. - `dropout_p` : The dropout probability of encoder. - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `optimizer` : Optimizer for training. ### `rnn_transducer` - `model_name` : Model name - `encoder_hidden_state_dim` : Dimension of encoder. - `decoder_hidden_state_dim` : Dimension of decoder. - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `encoder_dropout_p` : The dropout probability of encoder. - `decoder_dropout_p` : The dropout probability of decoder. - `bidirectional` : If True, becomes a bidirectional encoders - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `output_dim` : Dimension of outputs - `optimizer` : Optimizer for training. ### `transformer_lm` - `model_name` : Model name - `num_layers` : The number of encoder layers. - `d_model` : The dimension of model. - `d_ff` : The dimenstion of feed forward network. - `num_attention_heads` : The number of attention heads. - `dropout_p` : The dropout probability of encoder. - `max_length` : Max decoding length. - `optimizer` : Optimizer for training. ### `transformer` - `model_name` : Model name - `d_model` : Dimension of model. - `d_ff` : Dimenstion of feed forward network. - `num_attention_heads` : The number of attention heads. - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `encoder_dropout_p` : The dropout probability of encoder. - `decoder_dropout_p` : The dropout probability of decoder. - `ffnet_style` : Style of feed forward network. (ff, conv) - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `joint_ctc_attention` : Flag indication joint ctc attention or not - `optimizer` : Optimizer for training. ### `joint_ctc_transformer` - `model_name` : Model name - `extractor` : The CNN feature extractor. - `d_model` : Dimension of model. - `d_ff` : Dimenstion of feed forward network. - `num_attention_heads` : The number of attention heads. - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `encoder_dropout_p` : The dropout probability of encoder. - `decoder_dropout_p` : The dropout probability of decoder. - `ffnet_style` : Style of feed forward network. (ff, conv) - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `joint_ctc_attention` : Flag indication joint ctc attention or not - `optimizer` : Optimizer for training. ### `transformer_with_ctc` - `model_name` : Model name - `d_model` : Dimension of model. - `d_ff` : Dimenstion of feed forward network. - `num_attention_heads` : The number of attention heads. - `num_encoder_layers` : The number of encoder layers. - `encoder_dropout_p` : The dropout probability of encoder. - `ffnet_style` : Style of feed forward network. (ff, conv) - `optimizer` : Optimizer for training. ### `vgg_transformer` - `model_name` : Model name - `extractor` : The CNN feature extractor. - `d_model` : Dimension of model. - `d_ff` : Dimenstion of feed forward network. - `num_attention_heads` : The number of attention heads. - `num_encoder_layers` : The number of encoder layers. - `num_decoder_layers` : The number of decoder layers. - `encoder_dropout_p` : The dropout probability of encoder. - `decoder_dropout_p` : The dropout probability of decoder. - `ffnet_style` : Style of feed forward network. (ff, conv) - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `joint_ctc_attention` : Flag indication joint ctc attention or not - `optimizer` : Optimizer for training. ### `conformer` - `model_name` : Model name - `encoder_dim` : Dimension of encoder. - `num_encoder_layers` : The number of encoder layers. - `num_attention_heads` : The number of attention heads. - `feed_forward_expansion_factor` : The expansion factor of feed forward module. - `conv_expansion_factor` : The expansion factor of convolution module. - `input_dropout_p` : The dropout probability of inputs. - `feed_forward_dropout_p` : The dropout probability of feed forward module. - `attention_dropout_p` : The dropout probability of attention module. - `conv_dropout_p` : The dropout probability of convolution module. - `conv_kernel_size` : The kernel size of convolution. - `half_step_residual` : Flag indication whether to use half step residual or not - `optimizer` : Optimizer for training. ### `conformer_lstm` - `model_name` : Model name - `encoder_dim` : Dimension of encoder. - `num_encoder_layers` : The number of encoder layers. - `num_attention_heads` : The number of attention heads. - `feed_forward_expansion_factor` : The expansion factor of feed forward module. - `conv_expansion_factor` : The expansion factor of convolution module. - `input_dropout_p` : The dropout probability of inputs. - `feed_forward_dropout_p` : The dropout probability of feed forward module. - `attention_dropout_p` : The dropout probability of attention module. - `conv_dropout_p` : The dropout probability of convolution module. - `conv_kernel_size` : The kernel size of convolution. - `half_step_residual` : Flag indication whether to use half step residual or not - `num_decoder_layers` : The number of decoder layers. - `decoder_dropout_p` : The dropout probability of decoder. - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `decoder_attn_mechanism` : The attention mechanism for decoder. - `optimizer` : Optimizer for training. ### `conformer_transducer` - `model_name` : Model name - `encoder_dim` : Dimension of encoder. - `num_encoder_layers` : The number of encoder layers. - `num_attention_heads` : The number of attention heads. - `feed_forward_expansion_factor` : The expansion factor of feed forward module. - `conv_expansion_factor` : The expansion factor of convolution module. - `input_dropout_p` : The dropout probability of inputs. - `feed_forward_dropout_p` : The dropout probability of feed forward module. - `attention_dropout_p` : The dropout probability of attention module. - `conv_dropout_p` : The dropout probability of convolution module. - `conv_kernel_size` : The kernel size of convolution. - `half_step_residual` : Flag indication whether to use half step residual or not - `num_decoder_layers` : The number of decoder layers. - `decoder_dropout_p` : The dropout probability of decoder. - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `decoder_hidden_state_dim` : Hidden state dimension of decoder. - `decoder_output_dim` : Output dimension of decoder. - `optimizer` : Optimizer for training. ### `joint_ctc_conformer_lstm` - `model_name` : Model name - `encoder_dim` : Dimension of encoder. - `num_encoder_layers` : The number of encoder layers. - `num_attention_heads` : The number of attention heads. - `feed_forward_expansion_factor` : The expansion factor of feed forward module. - `conv_expansion_factor` : The expansion factor of convolution module. - `input_dropout_p` : The dropout probability of inputs. - `feed_forward_dropout_p` : The dropout probability of feed forward module. - `attention_dropout_p` : The dropout probability of attention module. - `conv_dropout_p` : The dropout probability of convolution module. - `conv_kernel_size` : The kernel size of convolution. - `half_step_residual` : Flag indication whether to use half step residual or not - `num_decoder_layers` : The number of decoder layers. - `decoder_dropout_p` : The dropout probability of decoder. - `num_decoder_attention_heads` : The number of decoder attention heads. - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `decoder_attn_mechanism` : The attention mechanism for decoder. - `optimizer` : Optimizer for training. ### `transformer_transducer` - `model_name` : Model name - `encoder_dim` : Dimension of encoder name - `d_ff` : Dimension of feed forward network - `num_audio_layers` : Number of audio layers - `num_label_layers` : Number of label layers - `num_attention_heads` : Number of attention heads - `audio_dropout_p` : Dropout probability of audio layer - `label_dropout_p` : Dropout probability of label layer - `decoder_hidden_state_dim` : Hidden state dimension of decoder - `decoder_output_dim` : Dimension of model output. - `conv_kernel_size` : Kernel size of convolution layer. - `max_positional_length` : Max length of positional encoding. - `optimizer` : Optimizer for training. ### `quartznet5x5` - `model_name` : Model name - `num_blocks` : Number of quartznet blocks - `num_sub_blocks` : Number of quartznet sub blocks - `in_channels` : Input channels of jasper blocks - `out_channels` : Output channels of jasper block's convolution - `kernel_size` : Kernel size of jasper block's convolution - `dilation` : Dilation of jasper block's convolution - `dropout_p` : Dropout probability - `optimizer` : Optimizer for training. ### `quartznet10x5` - `model_name` : Model name - `num_blocks` : Number of quartznet blocks - `num_sub_blocks` : Number of quartznet sub blocks - `in_channels` : Input channels of jasper blocks - `out_channels` : Output channels of jasper block's convolution - `kernel_size` : Kernel size of jasper block's convolution - `dilation` : Dilation of jasper block's convolution - `dropout_p` : Dropout probability - `optimizer` : Optimizer for training. ### `quartznet15x5` - `model_name` : Model name - `num_blocks` : Number of quartznet5x5 blocks - `num_sub_blocks` : Number of quartznet5x5 sub blocks - `in_channels` : Input channels of jasper blocks - `out_channels` : Output channels of jasper block's convolution - `kernel_size` : Kernel size of jasper block's convolution - `dilation` : Dilation of jasper block's convolution - `dropout_p` : Dropout probability - `optimizer` : Optimizer for training. ### `contextnet` - `model_name` : Model name - `model_size` : Model size - `input_dim` : Dimension of input vector - `num_encoder_layers` : The number of convolution layers - `kernel_size` : Value of convolution kernel size - `num_channels` : The number of channels in the convolution filter - `encoder_dim` : Dimension of encoder output vector - `optimizer` : Optimizer for training ### `contextnet_lstm` - `model_name` : Model name - `model_size` : Model size - `input_dim` : Dimension of input vector - `num_encoder_layers` : The number of convolution layers - `num_decoder_layers` : The number of decoder layers. - `kernel_size` : Value of convolution kernel size - `num_channels` : The number of channels in the convolution filter - `encoder_dim` : Dimension of encoder output vector - `num_attention_heads` : The number of attention heads. - `attention_dropout_p` : The dropout probability of attention module. - `decoder_dropout_p` : The dropout probability of decoder. - `max_length` : Max decoding length. - `teacher_forcing_ratio` : The ratio of teacher forcing. - `rnn_type` : Type of rnn cell (rnn, lstm, gru) - `decoder_attn_mechanism` : The attention mechanism for decoder. - `optimizer` : Optimizer for training. ### `contextnet_transducer` - `model_name` : Model name - `model_size` : Model size - `input_dim` : Dimension of input vector - `num_encoder_layers` : The number of convolution layers - `num_decoder_layers` : The number of rnn layers - `kernel_size` : Value of convolution kernel size - `num_channels` : The number of channels in the convolution filter - `hidden_dim` : The number of features in the decoder hidden state - `encoder_dim` : Dimension of encoder output vector - `decoder_output_dim` : Dimension of decoder output vector - `dropout` : Dropout probability of decoder - `rnn_type` : Type of rnn cell - `optimizer` : Optimizer for training ### `jasper5x3` - `model_name` : Model name - `num_blocks` : Number of jasper blocks - `num_sub_blocks` : Number of jasper sub blocks - `in_channels` : Input channels of jasper blocks - `out_channels` : Output channels of jasper block's convolution - `kernel_size` : Kernel size of jasper block's convolution - `dilation` : Dilation of jasper block's convolution - `dropout_p` : Dropout probability - `optimizer` : Optimizer for training. ### `jasper10x5` - `model_name` : Model name - `num_blocks` : Number of jasper blocks - `num_sub_blocks` : Number of jasper sub blocks - `in_channels` : Input channels of jasper blocks - `out_channels` : Output channels of jasper block's convolution - `kernel_size` : Kernel size of jasper block's convolution - `dilation` : Dilation of jasper block's convolution - `dropout_p` : Dropout probability - `optimizer` : Optimizer for training. ## `criterion` ### `label_smoothed_cross_entropy` - `criterion_name` : Criterion name for training. - `reduction` : Reduction method of criterion - `smoothing` : Ratio of smoothing loss (confidence = 1.0 - smoothing) ### `joint_ctc_cross_entropy` - `criterion_name` : Criterion name for training. - `reduction` : Reduction method of criterion - `ctc_weight` : Weight of ctc loss for training. - `cross_entropy_weight` : Weight of cross entropy loss for training. - `smoothing` : Ratio of smoothing loss (confidence = 1.0 - smoothing) - `zero_infinity` : Whether to zero infinite losses and the associated gradients. ### `perplexity` - `criterion_name` : Criterion name for training - `reduction` : Reduction method of criterion ### `transducer` - `criterion_name` : Criterion name for training. - `reduction` : Reduction method of criterion - `gather` : Reduce memory consumption. ### `ctc` - `criterion_name` : Criterion name for training - `reduction` : Reduction method of criterion - `zero_infinity` : Whether to zero infinite losses and the associated gradients. ### `cross_entropy` - `criterion_name` : Criterion name for training - `reduction` : Reduction method of criterion ## `lr_scheduler` ### `reduce_lr_on_plateau` - `lr` : Learning rate - `scheduler_name` : Name of learning rate scheduler. - `lr_patience` : Number of epochs with no improvement after which learning rate will be reduced. - `lr_factor` : Factor by which the learning rate will be reduced. new_lr = lr * factor. ### `warmup` - `lr` : Learning rate - `scheduler_name` : Name of learning rate scheduler. - `peak_lr` : Maximum learning rate. - `init_lr` : Initial learning rate. - `warmup_steps` : Warmup the learning rate linearly for the first N updates - `total_steps` : Total training steps. ### `warmup_reduce_lr_on_plateau` - `lr` : Learning rate - `scheduler_name` : Name of learning rate scheduler. - `lr_patience` : Number of epochs with no improvement after which learning rate will be reduced. - `lr_factor` : Factor by which the learning rate will be reduced. new_lr = lr * factor. - `peak_lr` : Maximum learning rate. - `init_lr` : Initial learning rate. - `warmup_steps` : Warmup the learning rate linearly for the first N updates ### `tri_stage` - `lr` : Learning rate - `scheduler_name` : Name of learning rate scheduler. - `init_lr` : Initial learning rate. - `init_lr_scale` : Initial learning rate scale. - `final_lr_scale` : Final learning rate scale - `phase_ratio` : Automatically sets warmup/hold/decay steps to the ratio specified here from max_updates. the ratios must add up to 1.0 - `total_steps` : Total training steps. ### `transformer` - `lr` : Learning rate - `scheduler_name` : Name of learning rate scheduler. - `peak_lr` : Maximum learning rate. - `final_lr` : Final learning rate. - `final_lr_scale` : Final learning rate scale - `warmup_steps` : Warmup the learning rate linearly for the first N updates - `decay_steps` : Steps in decay stages ## `trainer` ### `cpu` - `seed` : Seed for training. - `accelerator` : Previously known as distributed_backend (dp, ddp, ddp2, etc…). - `accumulate_grad_batches` : Accumulates grads every k batches or as set up in the dict. - `num_workers` : The number of cpu cores - `batch_size` : Size of batch - `check_val_every_n_epoch` : Check val every n train epochs. - `gradient_clip_val` : 0 means don’t clip. - `logger` : Training logger. {wandb, tensorboard} - `max_epochs` : Stop training once this number of epochs is reached. - `auto_scale_batch_size` : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. - `name` : Trainer name - `device` : Training device. - `use_cuda` : If set True, will train with GPU ### `gpu` - `seed` : Seed for training. - `accelerator` : Previously known as distributed_backend (dp, ddp, ddp2, etc…). - `accumulate_grad_batches` : Accumulates grads every k batches or as set up in the dict. - `num_workers` : The number of cpu cores - `batch_size` : Size of batch - `check_val_every_n_epoch` : Check val every n train epochs. - `gradient_clip_val` : 0 means don’t clip. - `logger` : Training logger. {wandb, tensorboard} - `max_epochs` : Stop training once this number of epochs is reached. - `auto_scale_batch_size` : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. - `name` : Trainer name - `device` : Training device. - `use_cuda` : If set True, will train with GPU - `auto_select_gpus` : If enabled and gpus is an integer, pick available gpus automatically. ### `tpu` - `seed` : Seed for training. - `accelerator` : Previously known as distributed_backend (dp, ddp, ddp2, etc…). - `accumulate_grad_batches` : Accumulates grads every k batches or as set up in the dict. - `num_workers` : The number of cpu cores - `batch_size` : Size of batch - `check_val_every_n_epoch` : Check val every n train epochs. - `gradient_clip_val` : 0 means don’t clip. - `logger` : Training logger. {wandb, tensorboard} - `max_epochs` : Stop training once this number of epochs is reached. - `auto_scale_batch_size` : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. - `name` : Trainer name - `device` : Training device. - `use_cuda` : If set True, will train with GPU - `use_tpu` : If set True, will train with GPU - `tpu_cores` : Number of TPU cores ### `gpu-fp16` - `seed` : Seed for training. - `accelerator` : Previously known as distributed_backend (dp, ddp, ddp2, etc…). - `accumulate_grad_batches` : Accumulates grads every k batches or as set up in the dict. - `num_workers` : The number of cpu cores - `batch_size` : Size of batch - `check_val_every_n_epoch` : Check val every n train epochs. - `gradient_clip_val` : 0 means don’t clip. - `logger` : Training logger. {wandb, tensorboard} - `max_epochs` : Stop training once this number of epochs is reached. - `auto_scale_batch_size` : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. - `name` : Trainer name - `device` : Training device. - `use_cuda` : If set True, will train with GPU - `auto_select_gpus` : If enabled and gpus is an integer, pick available gpus automatically. - `precision` : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs. - `amp_backend` : The mixed precision backend to use (“native” or “apex”) ### `tpu-fp16` - `seed` : Seed for training. - `accelerator` : Previously known as distributed_backend (dp, ddp, ddp2, etc…). - `accumulate_grad_batches` : Accumulates grads every k batches or as set up in the dict. - `num_workers` : The number of cpu cores - `batch_size` : Size of batch - `check_val_every_n_epoch` : Check val every n train epochs. - `gradient_clip_val` : 0 means don’t clip. - `logger` : Training logger. {wandb, tensorboard} - `max_epochs` : Stop training once this number of epochs is reached. - `auto_scale_batch_size` : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. - `name` : Trainer name - `device` : Training device. - `use_cuda` : If set True, will train with GPU - `use_tpu` : If set True, will train with GPU - `tpu_cores` : Number of TPU cores - `precision` : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs. - `amp_backend` : The mixed precision backend to use (“native” or “apex”) ### `cpu-fp64` - `seed` : Seed for training. - `accelerator` : Previously known as distributed_backend (dp, ddp, ddp2, etc…). - `accumulate_grad_batches` : Accumulates grads every k batches or as set up in the dict. - `num_workers` : The number of cpu cores - `batch_size` : Size of batch - `check_val_every_n_epoch` : Check val every n train epochs. - `gradient_clip_val` : 0 means don’t clip. - `logger` : Training logger. {wandb, tensorboard} - `max_epochs` : Stop training once this number of epochs is reached. - `auto_scale_batch_size` : If set to True, will initially run a batch size finder trying to find the largest batch size that fits into memory. - `name` : Trainer name - `device` : Training device. - `use_cuda` : If set True, will train with GPU - `precision` : Double precision (64), full precision (32) or half precision (16). Can be used on CPU, GPU or TPUs. - `amp_backend` : The mixed precision backend to use (“native” or “apex”) ## `tokenizer` ### `libri_subword` - `sos_token` : Start of sentence token - `eos_token` : End of sentence token - `pad_token` : Pad token - `blank_token` : Blank token (for CTC training) - `encoding` : Encoding of vocab - `unit` : Unit of vocabulary. - `vocab_size` : Size of vocabulary. - `vocab_path` : Path of vocabulary file. ### `libri_character` - `sos_token` : Start of sentence token - `eos_token` : End of sentence token - `pad_token` : Pad token - `blank_token` : Blank token (for CTC training) - `encoding` : Encoding of vocab - `unit` : Unit of vocabulary. - `vocab_path` : Path of vocabulary file. ### `aishell_character` - `sos_token` : Start of sentence token - `eos_token` : End of sentence token - `pad_token` : Pad token - `blank_token` : Blank token (for CTC training) - `encoding` : Encoding of vocab - `unit` : Unit of vocabulary. - `vocab_path` : Path of vocabulary file. ### `kspon_subword` - `sos_token` : Start of sentence token - `eos_token` : End of sentence token - `pad_token` : Pad token - `blank_token` : Blank token (for CTC training) - `encoding` : Encoding of vocab - `unit` : Unit of vocabulary. - `sp_model_path` : Path of sentencepiece model. - `vocab_size` : Size of vocabulary. ### `kspon_grapheme` - `sos_token` : Start of sentence token - `eos_token` : End of sentence token - `pad_token` : Pad token - `blank_token` : Blank token (for CTC training) - `encoding` : Encoding of vocab - `unit` : Unit of vocabulary. - `vocab_path` : Path of vocabulary file. ### `kspon_character` - `sos_token` : Start of sentence token - `eos_token` : End of sentence token - `pad_token` : Pad token - `blank_token` : Blank token (for CTC training) - `encoding` : Encoding of vocab - `unit` : Unit of vocabulary. - `vocab_path` : Path of vocabulary file.