oneflow.optim.RMSprop¶

class oneflow.optim.RMSprop(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, alpha: float = 0.99, eps: float = 1e-08, weight_decay: float = 0, momentum: float = 0.0, centered: bool = False)

Implements RMSprop algorithm.

oot Mean Squared Propagation (RMSProp) is an unpublished, adaptive learning rate method. The original slides proposed RMSProp: Slide 29 of http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf .

The original equation is as follows:

\begin{align}\begin{aligned}r(w, t) = \alpha r(w, t-1) + (1 - \alpha)(\nabla Q_{i}(w))^2\\\begin{split}W = w - \frac{\eta} {\\sqrt{r(w,t) + \epsilon}} \nabla Q_{i}(w)\end{split}\end{aligned}\end{align}

The first equation calculates moving average of the squared gradient for each weight. Then dividing the gradient by $$sqrt{v(w,t)}$$. In some cases, adding a momentum term :math: beta is beneficial. In our implementation, Nesterov momentum is used:

\begin{align}\begin{aligned}r(w, t) = \alpha r(w, t-1) + (1 - \alpha)(\nabla Q_{i}(w))^2\\\begin{split}v(w, t) = \beta v(w, t-1) + \frac{\eta} {\\sqrt{r(w,t) + \epsilon}} \nabla Q_{i}(w)\end{split}\\w = w - v(w, t)\end{aligned}\end{align}

if centered is True:

\begin{align}\begin{aligned}r(w, t) = \alpha r(w, t-1) + (1 - \alpha)(\nabla Q_{i}(w))^2\\g(w, t) = \alpha g(w, t-1) + (1 - \alpha)\nabla Q_{i}(w)\\\begin{split}v(w, t) = \beta v(w, t-1) + \frac{\eta} {\\sqrt{r(w,t) - (g(w, t))^2 + \epsilon}} \nabla Q_{i}(w)\end{split}\\w = w - v(w, t)\end{aligned}\end{align}

where, $$\alpha$$ is a hyperparameter and typical values are 0.99, 0.95 and so on. $$\beta$$ is the momentum term. $$\epsilon$$ is a smoothing term to avoid division by zero, usually set somewhere in range from 1e-4 to 1e-8.

Parameters
• params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

• lr (float, optional) – learning rate (default: 1e-2)

• momentum (float, optional) – momentum factor (default: 0, oneflow not support momenmtum > 0 now!)

• alpha (float, optional) – smoothing constant (default: 0.99)

• eps (float, optional) – term added to the denominator to improve numerical stability (default: 1e-8)

• centered (bool, optional) – if True, compute the centered RMSProp, the gradient is normalized by an estimation of its variance

• weight_decay (float, optional) – weight decay (L2 penalty) (default: 0)

For example:

Example 1:

# Assume net is a custom model.
rmsprop = flow.optim.RMSprop(net.parameters(), lr=1e-3)

for epoch in range(epochs):
# Read data, Compute the loss and so on.
# ...
loss.backward()
rmsprop.step()

Example 2:

# Assume net is a custom model.
rmsprop = flow.optim.RMSprop(
[
{
"params": net.parameters(),
"lr": learning_rate,
}
],
)

for epoch in range(epochs):
# Read data, Compute the loss and so on.
# ...
loss.backward()
rmsprop.step()

If you want to use clip_grad, you can refer this example.