Optim¶
Optimizer¶
- 
class openspeech.optim.optimizer.Optimizer(optim, scheduler=None, scheduler_period=None, max_grad_norm=0)[source]¶
- This is wrapper classs of torch.optim.Optimizer. This class provides functionalities for learning rate scheduling and gradient norm clipping. - Parameters
- optim (torch.optim.Optimizer) – optimizer object, the parameters to be optimized should be given when instantiating the object, e.g. torch.optim.Adam, torch.optim.SGD 
- scheduler (openspeech.optim.scheduler, optional) – learning rate scheduler 
- scheduler_period (int, optional) – timestep with learning rate scheduler 
- max_grad_norm (int, optional) – value used for gradient norm clipping 
 
 
Learning Rate Scheduler¶
ReduceLROnPlateau Scheduler¶
- 
class openspeech.optim.scheduler.reduce_lr_on_plateau_scheduler.ReduceLROnPlateauConfigs(lr: float = 0.0001, scheduler_name: str = 'reduce_lr_on_plateau', lr_patience: int = 1, lr_factor: float = 0.3)[source]¶
- 
class openspeech.optim.scheduler.reduce_lr_on_plateau_scheduler.ReduceLROnPlateauScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶
- Reduce learning rate when a metric has stopped improving. Models often benefit from reducing the learning rate by a factor of 2-10 once learning stagnates. This scheduler reads a metrics quantity and if no improvement is seen for a ‘patience’ number of epochs, the learning rate is reduced. - Parameters
- optimizer (Optimizer) – wrapped optimizer. 
- configs (DictConfig) – configuration set. 
 
 
Transformer Scheduler¶
- 
class openspeech.optim.scheduler.transformer_lr_scheduler.TransformerLRScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶
- Transformer Learning Rate Scheduler proposed in “Attention Is All You Need” - Parameters
- optimizer (Optimizer) – wrapped optimizer. 
- configs (DictConfig) – configuration set. 
 
 
Tri-Stage Scheduler¶
- 
class openspeech.optim.scheduler.tri_stage_lr_scheduler.TriStageLRScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶
- Tri-Stage Learning Rate Scheduler. Implement the learning rate scheduler in “SpecAugment” - Similar to inverse_squre_root scheduler, but tri_stage learning rate employs three stages LR scheduling: - warmup stage, starting from lr * init_lr_scale, linearly increased to lr in warmup_steps iterations 
- hold stage, after warmup_steps, keep the LR as lr for hold_steps iterations 
- decay stage, after hold stage, decay LR exponetially to lr * final_lr_scale in decay_steps; after that LR is keep as final_lr_scale * lr 
 - During warmup::
- init_lr = cfg.init_lr_scale * cfg.lr lrs = torch.linspace(init_lr, cfg.lr, cfg.warmup_steps) lr = lrs[update_num] 
- During hold::
- lr = cfg.lr 
- During decay::
- decay_factor = - math.log(cfg.final_lr_scale) / cfg.decay_steps lr = cfg.lr * exp(- (update_num - warmup_steps - decay_steps) * decay_factor) 
- After that::
- lr = cfg.lr * cfg.final_lr_scale 
 - Parameters
- optimizer (Optimizer) – wrapped optimizer. 
- configs (DictConfig) – configuration set. 
 
 
Warmup ReduceLROnPlateau Scheduler¶
- 
class openspeech.optim.scheduler.warmup_reduce_lr_on_plateau_scheduler.WarmupReduceLROnPlateauConfigs(lr: float = 0.0001, scheduler_name: str = 'warmup_reduce_lr_on_plateau', lr_patience: int = 1, lr_factor: float = 0.3, peak_lr: float = 0.0001, init_lr: float = 1e-10, warmup_steps: int = 4000)[source]¶
- 
class openspeech.optim.scheduler.warmup_reduce_lr_on_plateau_scheduler.WarmupReduceLROnPlateauScheduler(optimizer: torch.optim.optimizer.Optimizer, configs: omegaconf.dictconfig.DictConfig)[source]¶
- Warmup learning rate until warmup_steps and reduce learning rate on plateau after. - Parameters
- optimizer (Optimizer) – wrapped optimizer. 
- configs (DictConfig) – configuration set.