Datasets¶
Speech To Text Dataset¶
-
class
openspeech.data.audio.dataset.
SpeechToTextDataset
(configs: omegaconf.dictconfig.DictConfig, dataset_path: str, audio_paths: list, transcripts: list, sos_id: int = 1, eos_id: int = 2, del_silence: bool = False, apply_spec_augment: bool = False, apply_noise_augment: bool = False, apply_time_stretch_augment: bool = False, apply_joining_augment: bool = False)[source]¶ Dataset for audio & transcript matching
Note
Do not use this class directly, use one of the sub classes.
- Parameters
dataset_path (str) – path of librispeech dataset
audio_paths (list) – list of audio path
transcripts (list) – list of transript
sos_id (int) – identification of <startofsentence>
eos_id (int) – identification of <endofsentence>
del_silence (bool) – flag indication whether to apply delete silence or not
apply_spec_augment (bool) – flag indication whether to apply spec augment or not
apply_noise_augment (bool) – flag indication whether to apply noise augment or not
apply_time_stretch_augment (bool) – flag indication whether to apply time stretch augment or not
apply_joining_augment (bool) – flag indication whether to apply audio joining augment or not