Optim

Optimizer

class openspeech.optim.optimizer.Optimizer(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]

This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping.

Parameters
  • optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD

  • scheduler (openspeech.optim.scheduler, optional) – learning rate scheduler

  • scheduler_period (int, optional) – timestep with learning rate scheduler

  • max_grad_norm (int, optional) – value used for gradient norm clipping

Learning Rate Scheduler

class openspeech.optim.scheduler.lr_scheduler.LearningRateScheduler(optimizer, init_lr)[source]

Provides inteface of learning rate scheduler.

Note

Do not use this class directly, use one of the sub classes.

ReduceLROnPlateau Scheduler

class openspeech.optim.scheduler.reduce_lr_on_plateau_scheduler.ReduceLROnPlateauConfigs(lr: float = 0.0001, scheduler_name: str = 'reduce_lr_on_plateau', lr_patience: int = 1, lr_factor: float = 0.3)[source]
class openspeech.optim.scheduler.reduce_lr_on_plateau_scheduler.ReduceLROnPlateauScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]

Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • configs (DictConfig) – configuration set.

Transformer Scheduler

class openspeech.optim.scheduler.transformer_lr_scheduler.TransformerLRScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]

Transformer Learning Rate Scheduler proposed in “Attention Is All You Need”

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • configs (DictConfig) – configuration set.

class openspeech.optim.scheduler.transformer_lr_scheduler.TransformerLRSchedulerConfigs(lr: float = 0.0001, scheduler_name: str = 'transformer', peak_lr: float = 0.0001, final_lr: float = 1e-07, final_lr_scale: float = 0.05, warmup_steps: int = 10000, decay_steps: int = 150000)[source]

Tri-Stage Scheduler

class openspeech.optim.scheduler.tri_stage_lr_scheduler.TriStageLRScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]

Tri-Stage Learning Rate Scheduler. Implement the learning rate scheduler in “SpecAugment”

Similar to inverse_squre_root scheduler, but tri_stage learning rate employs three stages LR scheduling:

  • warmup stage, starting from lr * init_lr_scale, linearly increased to lr in warmup_steps iterations

  • hold stage, after warmup_steps, keep the LR as lr for hold_steps iterations

  • decay stage, after hold stage, decay LR exponetially to lr * final_lr_scale in decay_steps; after that LR is keep as final_lr_scale * lr

During warmup::

init_lr = cfg.init_lr_scale * cfg.lr lrs = torch.linspace(init_lr, cfg.lr, cfg.warmup_steps) lr = lrs[update_num]

During hold::

lr = cfg.lr

During decay::

decay_factor = - math.log(cfg.final_lr_scale) / cfg.decay_steps lr = cfg.lr * exp(- (update_num - warmup_steps - decay_steps) * decay_factor)

After that::

lr = cfg.lr * cfg.final_lr_scale

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • configs (DictConfig) – configuration set.

class openspeech.optim.scheduler.tri_stage_lr_scheduler.TriStageLRSchedulerConfigs(lr: float = 0.0001, scheduler_name: str = 'tri_stage', init_lr: float = 1e-07, init_lr_scale: float = 0.01, final_lr_scale: float = 0.01, phase_ratio: str = '(0.1, 0.4, 0.5)', total_steps: int = 400000)[source]

Warmup ReduceLROnPlateau Scheduler

class openspeech.optim.scheduler.warmup_reduce_lr_on_plateau_scheduler.WarmupReduceLROnPlateauConfigs(lr: float = 0.0001, scheduler_name: str = 'warmup_reduce_lr_on_plateau', lr_patience: int = 1, lr_factor: float = 0.3, peak_lr: float = 0.0001, init_lr: float = 1e-10, warmup_steps: int = 4000)[source]
class openspeech.optim.scheduler.warmup_reduce_lr_on_plateau_scheduler.WarmupReduceLROnPlateauScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]

Warmup learning rate until warmup_steps and reduce learning rate on plateau after.

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • configs (DictConfig) – configuration set.

Warmup Scheduler

class openspeech.optim.scheduler.warmup_scheduler.WarmupLRScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]

Warmup learning rate until total_steps

Parameters
  • optimizer (Optimizer) – wrapped optimizer.

  • configs (DictConfig) – configuration set.

class openspeech.optim.scheduler.warmup_scheduler.WarmupLRSchedulerConfigs(lr: float = 0.0001, scheduler_name: str = 'warmup', peak_lr: float = 0.0001, init_lr: float = 1e-07, warmup_steps: int = 4000, total_steps: int = 200000)[source]