Openspeech

GETTING STARTED

  • Introduction
  • Openspeech’s Hydra configuration
  • Openspeech’s configurations

OPENSPEECH MODELS

  • Openspeech Model
  • Openspeech CTC Model
  • Openspeech Encoder Decoder Model
  • Openspeech Transducer Model
  • Openspeech Language Model

MODEL ARCHITECTURES

  • Conformer
  • ContextNet
  • DeepSpeech2
  • Jasper
  • Listen Attend Spell Model
  • LSTM Language Model
  • QuartzNet Model
  • RNN Transducer Model
  • Transformer Model
  • Transformer Language Model
  • Transformer Transducer Model

CORPUS

  • AISHELL
  • KsponSpeech
  • LibriSpeech

LIBRARY REFERENCE

  • Callback
  • Criterion
  • Data Augment
  • Feature Transform
  • Datasets
  • Data Loaders
  • Decoders
  • Encoders
  • Modules
  • Optim
  • Search
    • base
    • Beam Search CTC
    • Beam Search LSTM
    • Beam Search Transformer
    • Beam Search RNN Transducer
    • Beam Search Transformer Transducer
    • Ensemble Search
  • Tokenizers
  • Metric
Openspeech
  • »
  • Search
  • View page source

Search¶

base¶

class openspeech.search.beam_search_base.OpenspeechBeamSearchBase(decoder, beam_size: int)[source]¶

Openspeech’s beam-search base class. Implement the methods required for beamsearch. You have to implement forward method.

Note

Do not use this class directly, use one of the sub classes.

Beam Search CTC¶

class openspeech.search.beam_search_ctc.BeamSearchCTC(labels: list, lm_path: str = None, alpha: int = 0, beta: int = 0, cutoff_top_n: int = 40, cutoff_prob: float = 1.0, beam_size: int = 3, num_processes: int = 4, blank_id: int = 0)[source]¶

Decodes probability output using ctcdecode package.

Parameters
  • labels (list) – the tokens you used to train your model

  • lm_path (str) – the path to your external kenlm language model(LM).

  • alpha (int) – weighting associated with the LMs probabilities.

  • beta (int) – weight associated with the number of words within our beam

  • cutoff_top_n (int) – cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.

  • cutoff_prob (float) – cutoff probability in pruning. 1.0 means no pruning.

  • beam_size (int) – this controls how broad the beam search is.

  • num_processes (int) – parallelize the batch using num_processes workers.

  • blank_id (int) – this should be the index of the CTC blank token

Inputs: logits, sizes
  • logits: Tensor of character probabilities, where probs[c,t] is the probability of character c at time t

  • sizes: Size of each sequence in the mini-batch

Returns

sequences of the model’s best prediction

Return type

  • outputs

forward(logits, sizes=None)[source]¶

Decodes probability output using ctcdecode package.

Inputs: logits, sizes

logits: Tensor of character probabilities, where probs[c,t] is the probability of character c at time t sizes: Size of each sequence in the mini-batch

Returns

sequences of the model’s best prediction

Return type

outputs

Beam Search LSTM¶

class openspeech.search.beam_search_lstm.BeamSearchLSTM(decoder: openspeech.decoders.lstm_attention_decoder.LSTMAttentionDecoder, beam_size: int)[source]¶

LSTM Beam Search Decoder

Args: decoder, beam_size, batch_size

decoder (DecoderLSTM): base decoder of lstm model. beam_size (int): size of beam.

Inputs: encoder_outputs, targets, encoder_output_lengths, teacher_forcing_ratio

encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size (batch, seq_length, dimension) targets (torch.LongTensor): A target sequence passed to decoders. IntTensor of size (batch, seq_length) encoder_output_lengths (torch.LongTensor): A encoder output lengths sequence. LongTensor of size (batch) teacher_forcing_ratio (float): Ratio of teacher forcing.

Returns

Log probability of model predictions.

Return type

  • logits (torch.FloatTensor)

forward(encoder_outputs: torch.Tensor, encoder_output_lengths: torch.Tensor) → torch.Tensor[source]¶

Beam search decoding.

Inputs: encoder_outputs

encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size (batch, seq_length, dimension)

Returns

Log probability of model predictions.

Return type

  • logits (torch.FloatTensor)

Beam Search Transformer¶

class openspeech.search.beam_search_transformer.BeamSearchTransformer(decoder: openspeech.decoders.transformer_decoder.TransformerDecoder, beam_size: int = 3)[source]¶

Transformer Beam Search Decoder

Args: decoder, beam_size, batch_size

decoder (DecoderLSTM): base decoder of lstm model. beam_size (int): size of beam.

Inputs: encoder_outputs, targets, encoder_output_lengths, teacher_forcing_ratio

encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size (batch, seq_length, dimension) targets (torch.LongTensor): A target sequence passed to decoders. IntTensor of size (batch, seq_length) encoder_output_lengths (torch.LongTensor): A encoder output lengths sequence. LongTensor of size (batch) teacher_forcing_ratio (float): Ratio of teacher forcing.

Returns

Log probability of model predictions.

Return type

  • logits (torch.FloatTensor)

Beam Search RNN Transducer¶

class openspeech.search.beam_search_rnn_transducer.BeamSearchRNNTransducer(joint, decoder: openspeech.decoders.rnn_transducer_decoder.RNNTransducerDecoder, beam_size: int = 3, expand_beam: float = 2.3, state_beam: float = 4.6, blank_id: int = 3)[source]¶

RNN Transducer Beam Search Reference: RNN-T FOR LATENCY CONTROLLED ASR WITH IMPROVED BEAM SEARCH (https://arxiv.org/pdf/1911.01629.pdf)

Args: joint, decoder, beam_size, expand_beam, state_beam, blank_id

joint: joint encoder_outputs and decoder_outputs decoder (TransformerTransducerDecoder): base decoder of transformer transducer model. beam_size (int): size of beam. expand_beam (int): The threshold coefficient to limit the number of expanded hypotheses. state_beam (int): The threshold coefficient to decide if hyps in A (process_hyps) is likely to compete with hyps in B (ongoing_beams) blank_id (int): blank id

Inputs: encoder_output, max_length
encoder_output (torch.FloatTensor): A output sequence of encoders. FloatTensor of size

(seq_length, dimension)

max_length (int): max decoding time step

Returns

model predictions.

Return type

  • predictions (torch.LongTensor)

forward(encoder_outputs: torch.Tensor, max_length: int)[source]¶

Beam search decoding.

Inputs: encoder_output, max_length

encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size (batch, seq_length, dimension) max_length (int): max decoding time step

Returns

model predictions.

Return type

  • predictions (torch.LongTensor)

Beam Search Transformer Transducer¶

class openspeech.search.beam_search_transformer_transducer.BeamSearchTransformerTransducer(joint, decoder: openspeech.decoders.transformer_transducer_decoder.TransformerTransducerDecoder, beam_size: int = 3, expand_beam: float = 2.3, state_beam: float = 4.6, blank_id: int = 3)[source]¶

Transformer Transducer Beam Search Reference: RNN-T FOR LATENCY CONTROLLED ASR WITH IMPROVED BEAM SEARCH (https://arxiv.org/pdf/1911.01629.pdf)

Args: joint, decoder, beam_size, expand_beam, state_beam, blank_id

joint: joint encoder_outputs and decoder_outputs decoder (TransformerTransducerDecoder): base decoder of transformer transducer model. beam_size (int): size of beam. expand_beam (int): The threshold coefficient to limit the number of expanded hypotheses that are added in A (process_hyp). state_beam (int): The threshold coefficient in log space to decide if hyps in A (process_hyps) is likely to compete with hyps in B (ongoing_beams) blank_id (int): blank id

Inputs: encoder_outputs, max_length
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size

(batch, seq_length, dimension)

max_length (int): max decoding time step

Returns

model predictions.

Return type

  • predictions (torch.LongTensor)

forward(encoder_outputs: torch.Tensor, max_length: int)[source]¶

Beam search decoding.

Inputs: encoder_outputs, max_length

encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size (batch, seq_length, dimension) max_length (int): max decoding time step

Returns

model predictions.

Return type

  • predictions (torch.LongTensor)

Ensemble Search¶

class openspeech.search.ensemble_search.EnsembleSearch(models: Union[list, tuple])[source]¶

Class for ensemble search.

Parameters

models (tuple) – list of ensemble model

Inputs:
  • inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be

    a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

prediction of ensemble models

Return type

  • predictions (torch.LongTensor)

class openspeech.search.ensemble_search.WeightedEnsembleSearch(models: Union[list, tuple], weights: Union[list, tuple])[source]¶
Parameters
  • models (tuple) – list of ensemble model

  • (tuple (weights) – list of ensemble’s weight

Inputs:
  • inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be

    a padded FloatTensor of size (batch, seq_length, dimension).

  • input_lengths (torch.LongTensor): The length of input tensor. (batch)

Returns

prediction of ensemble models

Return type

  • predictions (torch.LongTensor)

Next Previous

© Copyright 2021, Kim, Soohwan and Ha, Sangchun and Cho, Soyoung.

Built with Sphinx using a theme provided by Read the Docs.