oneflow.optim.RMSprop¶
-
class
oneflow.optim.
RMSprop
(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0.0, centered: bool = False)¶ Implements RMSprop algorithm.
oot Mean Squared Propagation (RMSProp) is an unpublished, adaptive learning rate method. The original slides proposed RMSProp: Slide 29 of http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf .
The original equation is as follows:
\[ \begin{align}\begin{aligned}r(w, t) = \alpha r(w, t-1) + (1 - \alpha)(\nabla Q_{i}(w))^2\\\begin{split}W = w - \frac{\eta} {\\sqrt{r(w,t) + \epsilon}} \nabla Q_{i}(w)\end{split}\end{aligned}\end{align} \]The first equation calculates moving average of the squared gradient for each weight. Then dividing the gradient by \(sqrt{v(w,t)}\). In some cases, adding a momentum term :math: beta is beneficial. In our implementation, Nesterov momentum is used:
\[ \begin{align}\begin{aligned}r(w, t) = \alpha r(w, t-1) + (1 - \alpha)(\nabla Q_{i}(w))^2\\\begin{split}v(w, t) = \beta v(w, t-1) + \frac{\eta} {\\sqrt{r(w,t) + \epsilon}} \nabla Q_{i}(w)\end{split}\\w = w - v(w, t)\end{aligned}\end{align} \]if centered is True:
\[ \begin{align}\begin{aligned}r(w, t) = \alpha r(w, t-1) + (1 - \alpha)(\nabla Q_{i}(w))^2\\g(w, t) = \alpha g(w, t-1) + (1 - \alpha)\nabla Q_{i}(w)\\\begin{split}v(w, t) = \beta v(w, t-1) + \frac{\eta} {\\sqrt{r(w,t) - (g(w, t))^2 + \epsilon}} \nabla Q_{i}(w)\end{split}\\w = w - v(w, t)\end{aligned}\end{align} \]where, \(\alpha\) is a hyperparameter and typical values are 0.99, 0.95 and so on. \(\beta\) is the momentum term. \(\epsilon\) is a smoothing term to avoid division by zero, usually set somewhere in range from 1e-4 to 1e-8.
- Parameters
params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – learning rate (default: 1e-2)
momentum (float, optional) – momentum factor (default: 0, oneflow not support momenmtum > 0 now!)
alpha (float, optional) – smoothing constant (default: 0.99)
eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)
centered (bool, optional) – if
True
, compute the centered RMSProp, the gradient is normalized by an estimation of its varianceweight_decay (float, optional) – weight decay (L2 penalty) (default: 0)
For example:
Example 1:
# Assume net is a custom model. rmsprop = flow.optim.RMSprop(net.parameters(), lr=1e-3) for epoch in range(epochs): # Read data, Compute the loss and so on. # ... loss.backward() rmsprop.step() rmsprop.zero_grad()
Example 2:
# Assume net is a custom model. rmsprop = flow.optim.RMSprop( [ { "params": net.parameters(), "lr": learning_rate, "clip_grad_max_norm": 0.5, "clip_grad_norm_type": 2.0, } ], ) for epoch in range(epochs): # Read data, Compute the loss and so on. # ... loss.backward() rmsprop.clip_grad() rmsprop.step() rmsprop.zero_grad()
If you want to use clip_grad, you can refer this example.
For more details of clip_grad_max_norm and clip_grad_norm_type, you can refer to
oneflow.nn.utils.clip_grad_norm_()
.-
__init__
(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0.0, centered: bool = False)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__delattr__
(name, /)Implement delattr(self, name).
__dir__
()Default dir() implementation.
__eq__
(value, /)Return self==value.
__format__
(format_spec, /)Default object formatter.
__ge__
(value, /)Return self>=value.
__getattribute__
(name, /)Return getattr(self, name).
__gt__
(value, /)Return self>value.
__hash__
()Return hash(self).
__init__
(params[, lr, alpha, eps, …])Initialize self.
__init_subclass__
This method is called when a class is subclassed.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(value, /)Return self!=value.
__new__
(**kwargs)Create and return a new object.
__reduce__
()Helper for pickle.
__reduce_ex__
(protocol, /)Helper for pickle.
__repr__
()Return repr(self).
__setattr__
(name, value, /)Implement setattr(self, name, value).
__sizeof__
()Size of object in memory, in bytes.
__str__
()Return str(self).
__subclasshook__
Abstract classes can override this to customize issubclass().
_check_variables_in_graph
(vars_conf)_check_variables_optimizer_bound
(vars_conf)_generate_conf_for_graph
(train_conf, vars_conf)_generate_grad_clip_conf_for_optim_conf
(…)_generate_indexed_slices_optimizer_conf
(…)_parse_input_parameters
(parameters)Supports such parameters:
add_param_group
(param_group)Add a param group to the
Optimizer
s param_groups.clip_grad
()Clips gradient norm of an iterable of parameters.
load_state_dict
(state_dict)Load the state of the optimizer which is created by state_dict function.
state_dict
()Returns the state of the optimizer as a
dict
.step
([closure])Performs a single optimization step.
zero_grad
([set_to_none])Sets the gradients of all optimized
oneflow.Tensor
s to zero.Attributes
support_sparse
Whether the Optimizer support sparse update.