
class oneflow.optim.SGD(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, momentum: float = 0.0, dampening: float = 0.0, weight_decay: float = 0.0, nesterov: bool = False, maximize: bool = False, fused: bool = False)

Implements SGD algorithm.

This algorithm takes a random sample’s gradient as an approximate estimate of the overall gradient in small batch gradient descent.

When the momentum = 0, the equation of parameters updating is:

\[param_{new} = param_{old} - learning\_rate * grad\]

With momentum, the equation of parameters updating is:

\[ \begin{align}\begin{aligned}& V_t = \beta * V_{t-1} - learning\_rate * (g_t + param_{old} * weight\_decay)\\& param_{new} = param_{old} + V_t\end{aligned}\end{align} \]
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • momentum (float, optional) – Momentum factor (default: 0.0)

  • weight_decay (float, optional) – weight decay (L2 penalty) (default: 0.0)

  • fused (bool, optional) – whether to divide all the parameters into several groups, then update each group of parameters with the fused kernel. (default: False)

For example:

Example 1:

# Assume net is a custom model.
sgd = flow.optim.SGD(net.parameters(), lr=1e-3)

for epoch in range(epochs):
    # Read data, Compute the loss and so on.
    # ...

Example 2:

# Assume net is a custom model.
sgd = flow.optim.SGD(
            "params": net.parameters(),
            "lr": learning_rate,
            "clip_grad_max_norm": 0.5,
            "clip_grad_norm_type": 2.0,

for epoch in range(epochs):
    # Read data, Compute the loss and so on.
    # ...

If you want to use clip_grad, you can refer this example.

For more details of clip_grad_max_norm and clip_grad_norm_type, you can refer to oneflow.nn.utils.clip_grad_norm_().

__init__(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, momentum: float = 0.0, dampening: float = 0.0, weight_decay: float = 0.0, nesterov: bool = False, maximize: bool = False, fused: bool = False)

Initialize self. See help(type(self)) for accurate signature.


Whether SGD Optimizer support sparse update.