oneflow.optim.LBFGS¶

class oneflow.optim.LBFGS(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, max_iter: int = 20, max_eval: Optional[int] = None, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn=None, contiguous_params: bool = False)¶

Implements LBFGS algorithm

It has been propose in On the limited memory BFGS method for large scale optimization. The implementation of the two-loop recursion proposed in Updating Quasi-Newton Matrices with Limited Storage.

The implementation of the strong_wolfe line search proposed in Numerical_Optimization_v2

This algorithm uses an estimated inverse Hessian matrix to steer its search through variable space and determine the optimal direction.

The line search algorithm terminates with a step length that satisfies the strong Wolfe conditions.

This optimizer only support one parameter group.

Parameters

params (iterable) – iterable of parameters to optimize or dicts defining parameter groups
lr (float, optional) – learning rate (default: 1e-3)
max_iter (int,optional) – max iteration per step (default: 20)
max_eval (int,optional) – max func evals per step (default: max_iter * 1.25)
tolerance_grad (float, optional) – termination tolerance on first order optimality (default 1e-7)
tolerance_change (float, optional) – termination tolerance on paramter changes (default: 1e-9)
history_size (int,optional) – paramter update history size (default: 100)
line_search_fn (str,optional) – line search function strong_wolfe or None (default: None)
contiguous_params (bool, optional) – whether to use contiguous ParamGroup which puts all parameters of the same type, device and group into the same tensor and update them together. (default: False)

For example:

# Assume net is a custom model.
lbfgs = flow.optim.LBFGS(net.parameters())

for epoch in range (epochs):
    def closure():
        lbfgs.zero_grad()
        # Read data, Compute the loss and so on.
        loss.backward()
        return loss
    lbfgs.step(closure)

__init__(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, max_iter: int = 20, max_eval: Optional[int] = None, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn=None, contiguous_params: bool = False)¶: Initialize self. See help(type(self)) for accurate signature.

Methods

`__delattr__`(name, /)	Implement delattr(self, name).
`__dir__`()	Default dir() implementation.
`__eq__`(value, /)	Return self==value.
`__format__`(format_spec, /)	Default object formatter.
`__ge__`(value, /)	Return self>=value.
`__getattribute__`(name, /)	Return getattr(self, name).
`__gt__`(value, /)	Return self>value.
`__hash__`()	Return hash(self).
`__init__`(params[, lr, max_iter, max_eval, …])	Initialize self.
`__init_subclass__`	This method is called when a class is subclassed.
`__le__`(value, /)	Return self<=value.
`__lt__`(value, /)	Return self<value.
`__ne__`(value, /)	Return self!=value.
`__new__`(**kwargs)	Create and return a new object.
`__reduce__`()	Helper for pickle.
`__reduce_ex__`(protocol, /)	Helper for pickle.
`__repr__`()	Return repr(self).
`__setattr__`(name, value, /)	Implement setattr(self, name, value).
`__sizeof__`()	Size of object in memory, in bytes.
`__str__`()	Return str(self).
`__subclasshook__`	Abstract classes can override this to customize issubclass().
`_check_variables_in_graph`(vars_conf)
`_check_variables_optimizer_bound`(vars_conf)
`_gather_flat_grad`()
`_generate_grad_clip_conf_for_optim_conf`(…)
`_generate_indexed_slices_optimizer_conf`(…)
`_generate_lr_scale_for_optim_conf`(…)
`_numel`()
`_parse_input_parameters`(parameters)	Supports such parameters:
`_try_direction`(closure, x, t, d)
`_update`(step_size, direction)
`add_param_group`(param_group)	Add a param group to the `Optimizer` s param_groups.
`clip_grad`([error_if_nonfinite])	Clips gradient norm of an iterable of parameters.
`load_state_dict`(state_dict)	Load the state of the optimizer which is created by state_dict function.
`state_dict`()	Returns the state of the optimizer as a `dict`.
`step`([closure])	Performs a single optimization step.
`zero_grad`([set_to_none])	Sets the gradients of all optimized `oneflow.Tensor` s to zero.

Attributes

support_sparse

Whether the Optimizer support sparse update.