Data Loaders¶

Speech To Text Data Loader¶

class openspeech.data.audio.data_loader.AudioDataLoader(dataset: torch.utils.data.dataset.Dataset, num_workers: int, batch_sampler: torch.utils.data.sampler.Sampler, **kwargs)[source]¶

Audio Data Loader

Parameters

dataset (torch.utils.data.Dataset) – dataset from which to load the data.
num_workers (int) – how many subprocesses to use for data loading.
batch_sampler (torch.utils.data.sampler.Sampler) – defines the strategy to draw samples from the dataset.

openspeech.data.audio.data_loader.load_dataset(manifest_file_path: str) → Tuple[list, list][source]¶

Provides dictionary of filename and labels.

Parameters: manifest_file_path (str) – evaluation manifest file path.

Returns: target_dict

target_dict (dict): dictionary of filename and labels

Text Data Loader¶

class openspeech.data.text.data_loader.TextDataLoader(dataset: torch.utils.data.dataset.Dataset, num_workers: int, batch_sampler: torch.utils.data.sampler.Sampler, **kwargs)[source]¶

Text Data Loader

Parameters

dataset (torch.utils.data.Dataset) – dataset from which to load the data.
num_workers (int) – how many subprocesses to use for data loading.
batch_sampler (torch.utils.data.sampler.Sampler) – defines the strategy to draw samples from the dataset.

Sampler¶

class openspeech.data.sampler.BucketingSampler(data_source, batch_size: int = 32, drop_last: bool = False)[source]¶

Samples batches assuming they are in order of size to batch similarly sized samples together.

Parameters

data_source (torch.utils.data.Dataset) – dataset to sample from
batch_size (int) – size of batch
drop_last (bool) – flat indication whether to drop last batch or not