oneflow.optim.Adagrad¶
-
class
oneflow.optim.
Adagrad
(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, lr_decay: float = 0.0, weight_decay: float = 0, initial_accumulator_value: float = 0.0, eps: float = 1e-10, contiguous_params: bool = False)¶ Implements Adagrad Optimizer.
The formula is:
\[ \begin{align}\begin{aligned}& S_{t} = S_{t-1} + grad \odot grad\\& decay\_lr = \frac{learning\_rate}{(1 + (train\_step - 1) * lr\_decay)}\\& X_{t} = X_{t-1} - \frac{decay\_lr}{\sqrt{S_{t} + \epsilon}} \odot grad\end{aligned}\end{align} \]- Parameters
params (Union[Iterator[Parameter], List[Dict]]) – iterable of parameters to optimize or dicts defining
groups (parameter) –
lr (float, optional) – The learning rate. Defaults to 0.001.
lr_decay (float, optional) – The decay factor of learning rate. Defaults to 0.0.
weight_decay (float, optional) – The weight decay. Defaults to 0.
initial_accumulator_value (float, optional) – The initial value of S. Defaults to 0.0.
eps (float, optional) – A small constant terms added to the denominator to improve numerical stability. Defaults to 1e-10.
contiguous_params (bool, optional) – whether to use contiguous ParamGroup which puts all parameters of the same type, device and group into the same tensor and update them together. (default: False)
For example:
Example 1:
# Assume net is a custom model. adagrad = flow.optim.Adagrad(net.parameters(), lr=1e-3) for epoch in range(epochs): # Read data, Compute the loss and so on. # ... loss.backward() adagrad.step() adagrad.zero_grad()
Example 2:
# Assume net is a custom model. adagrad = flow.optim.Adagrad( [ { "params": net.parameters(), "lr": learning_rate, "clip_grad_max_norm": 0.5, "clip_grad_norm_type": 2.0, } ], ) for epoch in range(epochs): # Read data, Compute the loss and so on. # ... loss.backward() adagrad.clip_grad() adagrad.step() adagrad.zero_grad()
If you want to use clip_grad, you can refer this example.
For more details of clip_grad_max_norm and clip_grad_norm_type, you can refer to
oneflow.nn.utils.clip_grad_norm_()
.-
__init__
(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, lr_decay: float = 0.0, weight_decay: float = 0, initial_accumulator_value: float = 0.0, eps: float = 1e-10, contiguous_params: bool = False)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__delattr__
(name, /)Implement delattr(self, name).
__dir__
()Default dir() implementation.
__eq__
(value, /)Return self==value.
__format__
(format_spec, /)Default object formatter.
__ge__
(value, /)Return self>=value.
__getattribute__
(name, /)Return getattr(self, name).
__gt__
(value, /)Return self>value.
__hash__
()Return hash(self).
__init__
(params[, lr, lr_decay, …])Initialize self.
__init_subclass__
This method is called when a class is subclassed.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(value, /)Return self!=value.
__new__
(**kwargs)Create and return a new object.
__reduce__
()Helper for pickle.
__reduce_ex__
(protocol, /)Helper for pickle.
__repr__
()Return repr(self).
__setattr__
(name, value, /)Implement setattr(self, name, value).
__sizeof__
()Size of object in memory, in bytes.
__str__
()Return str(self).
__subclasshook__
Abstract classes can override this to customize issubclass().
_check_variables_in_graph
(vars_conf)_check_variables_optimizer_bound
(vars_conf)_generate_conf_for_graph
(train_conf, vars_conf)_generate_grad_clip_conf_for_optim_conf
(…)_generate_indexed_slices_optimizer_conf
(…)_generate_lr_scale_for_optim_conf
(…)_parse_input_parameters
(parameters)Supports such parameters:
add_param_group
(param_group)Add a param group to the
Optimizer
s param_groups.clip_grad
([error_if_nonfinite])Clips gradient norm of an iterable of parameters.
load_state_dict
(state_dict)Load the state of the optimizer which is created by state_dict function.
state_dict
()Returns the state of the optimizer as a
dict
.step
([closure])Performs a single optimization step.
zero_grad
([set_to_none])Sets the gradients of all optimized
oneflow.Tensor
s to zero.Attributes
support_sparse
Whether the Optimizer support sparse update.