Datasets

Speech To Text Dataset

class openspeech.data.audio.dataset.SpeechToTextDataset(configs: omegaconf.dictconfig.DictConfig, dataset_path: str, audio_paths: list, transcripts: list, sos_id: int = 1, eos_id: int = 2, del_silence: bool = False, apply_spec_augment: bool = False, apply_noise_augment: bool = False, apply_time_stretch_augment: bool = False, apply_joining_augment: bool = False)[source]

Dataset for audio & transcript matching

Note

Do not use this class directly, use one of the sub classes.

Parameters
  • dataset_path (str) – path of librispeech dataset

  • audio_paths (list) – list of audio path

  • transcripts (list) – list of transript

  • sos_id (int) – identification of <startofsentence>

  • eos_id (int) – identification of <endofsentence>

  • del_silence (bool) – flag indication whether to apply delete silence or not

  • apply_spec_augment (bool) – flag indication whether to apply spec augment or not

  • apply_noise_augment (bool) – flag indication whether to apply noise augment or not

  • apply_time_stretch_augment (bool) – flag indication whether to apply time stretch augment or not

  • apply_joining_augment (bool) – flag indication whether to apply audio joining augment or not

Text Dataset

class openspeech.data.text.dataset.TextDataset(transcripts: list, tokenizer)[source]

Dataset for language modeling.

Parameters
  • transcripts (list) – list of transcript

  • tokenizer (Tokenizer) – tokenizer is in charge of preparing the inputs for a model.