oneflow.optim.LBFGS¶
-
class
oneflow.optim.
LBFGS
(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, max_iter: int = 20, max_eval: Optional[int] = None, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn=None, contiguous_params: bool = False)¶ Implements LBFGS algorithm
It has been propose in On the limited memory BFGS method for large scale optimization. The implementation of the two-loop recursion proposed in Updating Quasi-Newton Matrices with Limited Storage.
The implementation of the strong_wolfe line search proposed in Numerical_Optimization_v2
This algorithm uses an estimated inverse Hessian matrix to steer its search through variable space and determine the optimal direction.
The line search algorithm terminates with a step length that satisfies the strong Wolfe conditions.
This optimizer only support one parameter group.
- Parameters
params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – learning rate (default: 1e-3)
max_iter (int,optional) – max iteration per step (default: 20)
max_eval (int,optional) – max func evals per step (default: max_iter * 1.25)
tolerance_grad (float, optional) – termination tolerance on first order optimality (default 1e-7)
tolerance_change (float, optional) – termination tolerance on paramter changes (default: 1e-9)
history_size (int,optional) – paramter update history size (default: 100)
line_search_fn (str,optional) – line search function strong_wolfe or None (default: None)
contiguous_params (bool, optional) – whether to use contiguous ParamGroup which puts all parameters of the same type, device and group into the same tensor and update them together. (default: False)
For example:
# Assume net is a custom model. lbfgs = flow.optim.LBFGS(net.parameters()) for epoch in range (epochs): def closure(): lbfgs.zero_grad() # Read data, Compute the loss and so on. loss.backward() return loss lbfgs.step(closure)
-
__init__
(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, max_iter: int = 20, max_eval: Optional[int] = None, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn=None, contiguous_params: bool = False)¶ Initialize self. See help(type(self)) for accurate signature.
Methods
__delattr__
(name, /)Implement delattr(self, name).
__dir__
()Default dir() implementation.
__eq__
(value, /)Return self==value.
__format__
(format_spec, /)Default object formatter.
__ge__
(value, /)Return self>=value.
__getattribute__
(name, /)Return getattr(self, name).
__gt__
(value, /)Return self>value.
__hash__
()Return hash(self).
__init__
(params[, lr, max_iter, max_eval, …])Initialize self.
__init_subclass__
This method is called when a class is subclassed.
__le__
(value, /)Return self<=value.
__lt__
(value, /)Return self<value.
__ne__
(value, /)Return self!=value.
__new__
(**kwargs)Create and return a new object.
__reduce__
()Helper for pickle.
__reduce_ex__
(protocol, /)Helper for pickle.
__repr__
()Return repr(self).
__setattr__
(name, value, /)Implement setattr(self, name, value).
__sizeof__
()Size of object in memory, in bytes.
__str__
()Return str(self).
__subclasshook__
Abstract classes can override this to customize issubclass().
_check_variables_in_graph
(vars_conf)_check_variables_optimizer_bound
(vars_conf)_gather_flat_grad
()_generate_grad_clip_conf_for_optim_conf
(…)_generate_indexed_slices_optimizer_conf
(…)_generate_lr_scale_for_optim_conf
(…)_numel
()_parse_input_parameters
(parameters)Supports such parameters:
_try_direction
(closure, x, t, d)_update
(step_size, direction)add_param_group
(param_group)Add a param group to the
Optimizer
s param_groups.clip_grad
([error_if_nonfinite])Clips gradient norm of an iterable of parameters.
load_state_dict
(state_dict)Load the state of the optimizer which is created by state_dict function.
state_dict
()Returns the state of the optimizer as a
dict
.step
([closure])Performs a single optimization step.
zero_grad
([set_to_none])Sets the gradients of all optimized
oneflow.Tensor
s to zero.Attributes
support_sparse
Whether the Optimizer support sparse update.