Datasets¶

Speech To Text Dataset¶

class openspeech.data.audio.dataset.SpeechToTextDataset(configs: omegaconf.dictconfig.DictConfig, dataset_path: str, audio_paths: list, transcripts: list, sos_id: int = 1, eos_id: int = 2, del_silence: bool = False, apply_spec_augment: bool = False, apply_noise_augment: bool = False, apply_time_stretch_augment: bool = False, apply_joining_augment: bool = False)[source]¶

Dataset for audio & transcript matching

Note

Do not use this class directly, use one of the sub classes.

Parameters

dataset_path (str) – path of librispeech dataset
audio_paths (list) – list of audio path
transcripts (list) – list of transript
sos_id (int) – identification of <startofsentence>
eos_id (int) – identification of <endofsentence>
del_silence (bool) – flag indication whether to apply delete silence or not
apply_spec_augment (bool) – flag indication whether to apply spec augment or not
apply_noise_augment (bool) – flag indication whether to apply noise augment or not
apply_time_stretch_augment (bool) – flag indication whether to apply time stretch augment or not
apply_joining_augment (bool) – flag indication whether to apply audio joining augment or not

Text Dataset¶

class openspeech.data.text.dataset.TextDataset(transcripts: list, tokenizer)[source]¶

Dataset for language modeling.

Parameters

transcripts (list) – list of transcript
tokenizer (Tokenizer) – tokenizer is in charge of preparing the inputs for a model.