Search¶
base¶
Beam Search CTC¶
-
class
openspeech.search.beam_search_ctc.
BeamSearchCTC
(labels: list, lm_path: str = None, alpha: int = 0, beta: int = 0, cutoff_top_n: int = 40, cutoff_prob: float = 1.0, beam_size: int = 3, num_processes: int = 4, blank_id: int = 0)[source]¶ Decodes probability output using ctcdecode package.
- Parameters
labels (list) – the tokens you used to train your model
lm_path (str) – the path to your external kenlm language model(LM).
alpha (int) – weighting associated with the LMs probabilities.
beta (int) – weight associated with the number of words within our beam
cutoff_top_n (int) – cutoff number in pruning. Only the top cutoff_top_n characters with the highest probability in the vocab will be used in beam search.
cutoff_prob (float) – cutoff probability in pruning. 1.0 means no pruning.
beam_size (int) – this controls how broad the beam search is.
num_processes (int) – parallelize the batch using num_processes workers.
blank_id (int) – this should be the index of the CTC blank token
- Inputs: logits, sizes
logits: Tensor of character probabilities, where probs[c,t] is the probability of character c at time t
sizes: Size of each sequence in the mini-batch
- Returns
sequences of the model’s best prediction
- Return type
outputs
-
forward
(logits, sizes=None)[source]¶ Decodes probability output using ctcdecode package.
- Inputs: logits, sizes
logits: Tensor of character probabilities, where probs[c,t] is the probability of character c at time t sizes: Size of each sequence in the mini-batch
- Returns
sequences of the model’s best prediction
- Return type
outputs
Beam Search LSTM¶
-
class
openspeech.search.beam_search_lstm.
BeamSearchLSTM
(decoder: openspeech.decoders.lstm_attention_decoder.LSTMAttentionDecoder, beam_size: int)[source]¶ LSTM Beam Search Decoder
- Args: decoder, beam_size, batch_size
decoder (DecoderLSTM): base decoder of lstm model. beam_size (int): size of beam.
- Inputs: encoder_outputs, targets, encoder_output_lengths, teacher_forcing_ratio
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(batch, seq_length, dimension)
targets (torch.LongTensor): A target sequence passed to decoders. IntTensor of size(batch, seq_length)
encoder_output_lengths (torch.LongTensor): A encoder output lengths sequence. LongTensor of size(batch)
teacher_forcing_ratio (float): Ratio of teacher forcing.
- Returns
Log probability of model predictions.
- Return type
logits (torch.FloatTensor)
-
forward
(encoder_outputs: torch.Tensor, encoder_output_lengths: torch.Tensor) → torch.Tensor[source]¶ Beam search decoding.
- Inputs: encoder_outputs
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(batch, seq_length, dimension)
- Returns
Log probability of model predictions.
- Return type
logits (torch.FloatTensor)
Beam Search Transformer¶
-
class
openspeech.search.beam_search_transformer.
BeamSearchTransformer
(decoder: openspeech.decoders.transformer_decoder.TransformerDecoder, beam_size: int = 3)[source]¶ Transformer Beam Search Decoder
- Args: decoder, beam_size, batch_size
decoder (DecoderLSTM): base decoder of lstm model. beam_size (int): size of beam.
- Inputs: encoder_outputs, targets, encoder_output_lengths, teacher_forcing_ratio
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(batch, seq_length, dimension)
targets (torch.LongTensor): A target sequence passed to decoders. IntTensor of size(batch, seq_length)
encoder_output_lengths (torch.LongTensor): A encoder output lengths sequence. LongTensor of size(batch)
teacher_forcing_ratio (float): Ratio of teacher forcing.
- Returns
Log probability of model predictions.
- Return type
logits (torch.FloatTensor)
Beam Search RNN Transducer¶
-
class
openspeech.search.beam_search_rnn_transducer.
BeamSearchRNNTransducer
(joint, decoder: openspeech.decoders.rnn_transducer_decoder.RNNTransducerDecoder, beam_size: int = 3, expand_beam: float = 2.3, state_beam: float = 4.6, blank_id: int = 3)[source]¶ RNN Transducer Beam Search Reference: RNN-T FOR LATENCY CONTROLLED ASR WITH IMPROVED BEAM SEARCH (https://arxiv.org/pdf/1911.01629.pdf)
- Args: joint, decoder, beam_size, expand_beam, state_beam, blank_id
joint: joint encoder_outputs and decoder_outputs decoder (TransformerTransducerDecoder): base decoder of transformer transducer model. beam_size (int): size of beam. expand_beam (int): The threshold coefficient to limit the number of expanded hypotheses. state_beam (int): The threshold coefficient to decide if hyps in A (process_hyps) is likely to compete with hyps in B (ongoing_beams) blank_id (int): blank id
- Inputs: encoder_output, max_length
- encoder_output (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(seq_length, dimension)
max_length (int): max decoding time step
- Returns
model predictions.
- Return type
predictions (torch.LongTensor)
-
forward
(encoder_outputs: torch.Tensor, max_length: int)[source]¶ Beam search decoding.
- Inputs: encoder_output, max_length
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(batch, seq_length, dimension)
max_length (int): max decoding time step
- Returns
model predictions.
- Return type
predictions (torch.LongTensor)
Beam Search Transformer Transducer¶
-
class
openspeech.search.beam_search_transformer_transducer.
BeamSearchTransformerTransducer
(joint, decoder: openspeech.decoders.transformer_transducer_decoder.TransformerTransducerDecoder, beam_size: int = 3, expand_beam: float = 2.3, state_beam: float = 4.6, blank_id: int = 3)[source]¶ Transformer Transducer Beam Search Reference: RNN-T FOR LATENCY CONTROLLED ASR WITH IMPROVED BEAM SEARCH (https://arxiv.org/pdf/1911.01629.pdf)
- Args: joint, decoder, beam_size, expand_beam, state_beam, blank_id
joint: joint encoder_outputs and decoder_outputs decoder (TransformerTransducerDecoder): base decoder of transformer transducer model. beam_size (int): size of beam. expand_beam (int): The threshold coefficient to limit the number of expanded hypotheses that are added in A (process_hyp). state_beam (int): The threshold coefficient in log space to decide if hyps in A (process_hyps) is likely to compete with hyps in B (ongoing_beams) blank_id (int): blank id
- Inputs: encoder_outputs, max_length
- encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(batch, seq_length, dimension)
max_length (int): max decoding time step
- Returns
model predictions.
- Return type
predictions (torch.LongTensor)
-
forward
(encoder_outputs: torch.Tensor, max_length: int)[source]¶ Beam search decoding.
- Inputs: encoder_outputs, max_length
encoder_outputs (torch.FloatTensor): A output sequence of encoders. FloatTensor of size
(batch, seq_length, dimension)
max_length (int): max decoding time step
- Returns
model predictions.
- Return type
predictions (torch.LongTensor)
Ensemble Search¶
-
class
openspeech.search.ensemble_search.
EnsembleSearch
(models: Union[list, tuple])[source]¶ Class for ensemble search.
- Parameters
models (tuple) – list of ensemble model
- Inputs:
- inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be
a padded FloatTensor of size
(batch, seq_length, dimension)
.
input_lengths (torch.LongTensor): The length of input tensor.
(batch)
- Returns
prediction of ensemble models
- Return type
predictions (torch.LongTensor)
-
class
openspeech.search.ensemble_search.
WeightedEnsembleSearch
(models: Union[list, tuple], weights: Union[list, tuple])[source]¶ - Parameters
models (tuple) – list of ensemble model
(tuple (weights) – list of ensemble’s weight
- Inputs:
- inputs (torch.FloatTensor): A input sequence passed to encoders. Typically for inputs this will be
a padded FloatTensor of size
(batch, seq_length, dimension)
.
input_lengths (torch.LongTensor): The length of input tensor.
(batch)
- Returns
prediction of ensemble models
- Return type
predictions (torch.LongTensor)