KsponSpeech¶
KsponSpeech¶
-
class
openspeech.datasets.ksponspeech.lit_data_module.
LightningKsponSpeechDataModule
(*args: Any, **kwargs: Any)[source]¶ Lightning data module for KsponSpeech. KsponSpeech corpus contains 969 h of general open-domain dialog utterances, spoken by about 2000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments.
- Parameters
configs (DictConfig) – configuration set.
-
prepare_data
()[source]¶ Prepare KsponSpeech manifest file. If there is not exist manifest file, generate manifest file.
- Returns
tokenizer is in charge of preparing the inputs for a model.
- Return type
tokenizer (Tokenizer)
-
setup
(stage: Optional[str] = None, tokenizer: openspeech.tokenizers.tokenizer.Tokenizer = None)[source]¶ Split train and valid dataset for training.
-
test_dataloader
() → openspeech.data.audio.data_loader.AudioDataLoader[source]¶ Return data loader for training.
-
train_dataloader
() → openspeech.data.audio.data_loader.AudioDataLoader[source]¶ Return data loader for training.
-
val_dataloader
() → openspeech.data.audio.data_loader.AudioDataLoader[source]¶ Return data loader for validation.