oneflow.optim.lr_scheduler.CosineAnnealingWarmRestarts

class oneflow.optim.lr_scheduler.CosineAnnealingWarmRestarts(optimizer: oneflow.nn.optimizer.optimizer.Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, decay_rate: float = 1.0, restart_limit: int = 0, last_step: int = - 1, verbose: bool = False)

Set the learning rate of each parameter group using a cosine annealing schedule, where \(\eta_{max}\) is set to the initial lr, \(T_{cur}\) is the number of steps since the last restart and \(T_{i}\) is the number of steps between two warm restarts in SGDR:

\[\eta_t = \eta_{min} + \frac{1}{2}(\eta_{max} - \eta_{min})\left(1 + \cos\left(\frac{T_{cur}}{T_{i}}\pi\right)\right)\]

When \(T_{cur}=T_{i}\), set \(\eta_t = \eta_{min}\). When \(T_{cur}=0\) after restart, set \(\eta_t=\eta_{max}\).

It has been proposed in SGDR: Stochastic Gradient Descent with Warm Restarts.

Parameters
  • optimizer (Optimizer) – Wrapped optimizer.

  • T_0 (int) – Number of iterations for the first restart.

  • T_mult (int, optional) – A factor increases \(T_{i}\) after a restart. Default: 1.

  • eta_min (float, optional) – Minimum learning rate. Default: 0.

  • decay_rate (float, optional) – Decay rate every restarts.

  • restart_limit (int, optional) – The limit of restarts. 0 indicate unlimited restarts. Default: 0.

  • last_step (int, optional) – The index of last step. Default: -1.

  • verbose (bool) – If True, prints a message to stdout for each update. Default: False.

__init__(optimizer: oneflow.nn.optimizer.optimizer.Optimizer, T_0: int, T_mult: int = 1, eta_min: float = 0.0, decay_rate: float = 1.0, restart_limit: int = 0, last_step: int = - 1, verbose: bool = False)

Initialize self. See help(type(self)) for accurate signature.

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(optimizer, T_0[, T_mult, eta_min, …])

Initialize self.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

Create and return a new object.

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_generate_conf_for_graph(lr_conf)

_init_base_lrs()

get_last_lr()

Return last computed learning rate by current scheduler.

get_lr(base_lr, step)

Compute learning rate using chainable form of the scheduler

load_state_dict(state_dict)

Load the schedulers state.

print_lr(group, lr)

Display the current learning rate.

state_dict()

Return the state of the scheduler as a dict.

step()

update_lrs(lrs)