Optim¶
Optimizer¶
-
class
openspeech.optim.optimizer.
Optimizer
(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]¶ This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping.
- Parameters
optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD
scheduler (openspeech.optim.scheduler, optional) – learning rate scheduler
scheduler_period (int, optional) – timestep with learning rate scheduler
max_grad_norm (int, optional) – value used for gradient norm clipping
Learning Rate Scheduler¶
ReduceLROnPlateau Scheduler¶
-
class
openspeech.optim.scheduler.reduce_lr_on_plateau_scheduler.
ReduceLROnPlateauConfigs
(lr: float = 0.0001, scheduler_name: str = 'reduce_lr_on_plateau', lr_patience: int = 1, lr_factor: float = 0.3)[source]¶
-
class
openspeech.optim.scheduler.reduce_lr_on_plateau_scheduler.
ReduceLROnPlateauScheduler
(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶ Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced.
- Parameters
optimizer (Optimizer) – wrapped optimizer.
configs (DictConfig) – configuration set.
Transformer Scheduler¶
-
class
openspeech.optim.scheduler.transformer_lr_scheduler.
TransformerLRScheduler
(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶ Transformer Learning Rate Scheduler proposed in “Attention Is All You Need”
- Parameters
optimizer (Optimizer) – wrapped optimizer.
configs (DictConfig) – configuration set.
Tri-Stage Scheduler¶
-
class
openspeech.optim.scheduler.tri_stage_lr_scheduler.
TriStageLRScheduler
(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶ Tri-Stage Learning Rate Scheduler. Implement the learning rate scheduler in “SpecAugment”
Similar to inverse_squre_root scheduler, but tri_stage learning rate employs three stages LR scheduling:
warmup stage, starting from lr * init_lr_scale, linearly increased to lr in warmup_steps iterations
hold stage, after warmup_steps, keep the LR as lr for hold_steps iterations
decay stage, after hold stage, decay LR exponetially to lr * final_lr_scale in decay_steps; after that LR is keep as final_lr_scale * lr
- During warmup::
init_lr = cfg.init_lr_scale * cfg.lr lrs = torch.linspace(init_lr, cfg.lr, cfg.warmup_steps) lr = lrs[update_num]
- During hold::
lr = cfg.lr
- During decay::
decay_factor = - math.log(cfg.final_lr_scale) / cfg.decay_steps lr = cfg.lr * exp(- (update_num - warmup_steps - decay_steps) * decay_factor)
- After that::
lr = cfg.lr * cfg.final_lr_scale
- Parameters
optimizer (Optimizer) – wrapped optimizer.
configs (DictConfig) – configuration set.
Warmup ReduceLROnPlateau Scheduler¶
-
class
openspeech.optim.scheduler.warmup_reduce_lr_on_plateau_scheduler.
WarmupReduceLROnPlateauConfigs
(lr: float = 0.0001, scheduler_name: str = 'warmup_reduce_lr_on_plateau', lr_patience: int = 1, lr_factor: float = 0.3, peak_lr: float = 0.0001, init_lr: float = 1e-10, warmup_steps: int = 4000)[source]¶
-
class
openspeech.optim.scheduler.warmup_reduce_lr_on_plateau_scheduler.
WarmupReduceLROnPlateauScheduler
(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶ Warmup learning rate until warmup_steps and reduce learning rate on plateau after.
- Parameters
optimizer (Optimizer) – wrapped optimizer.
configs (DictConfig) – configuration set.