KsponSpeech¶
KsponSpeech¶
- 
class openspeech.datasets.ksponspeech.lit_data_module.LightningKsponSpeechDataModule(*args: Any, **kwargs: Any)[source]¶
- Lightning data module for KsponSpeech. KsponSpeech corpus contains 969 h of general open-domain dialog utterances, spoken by about 2000 native Korean speakers in a clean environment. All data were constructed by recording the dialogue of two people freely conversing on a variety of topics and manually transcribing the utterances. The transcription provides a dual transcription consisting of orthography and pronunciation, and disfluency tags for spontaneity of speech, such as filler words, repeated words, and word fragments. - Parameters
- configs (DictConfig) – configuration set. 
 - 
prepare_data()[source]¶
- Prepare KsponSpeech manifest file. If there is not exist manifest file, generate manifest file. - Returns
- tokenizer is in charge of preparing the inputs for a model. 
- Return type
- tokenizer (Tokenizer) 
 
 - 
setup(stage: Optional[str] = None, tokenizer: openspeech.tokenizers.tokenizer.Tokenizer = None)[source]¶
- Split train and valid dataset for training. 
 - 
test_dataloader() → openspeech.data.audio.data_loader.AudioDataLoader[source]¶
- Return data loader for training. 
 - 
train_dataloader() → openspeech.data.audio.data_loader.AudioDataLoader[source]¶
- Return data loader for training. 
 - 
val_dataloader() → openspeech.data.audio.data_loader.AudioDataLoader[source]¶
- Return data loader for validation.