oneflow.optim.LBFGS

class oneflow.optim.LBFGS(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, max_iter: int = 20, max_eval: Optional[int] = None, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn=None, contiguous_params: bool = False)

Implements LBFGS algorithm

It has been propose in On the limited memory BFGS method for large scale optimization. The implementation of the two-loop recursion proposed in Updating Quasi-Newton Matrices with Limited Storage.

The implementation of the strong_wolfe line search proposed in Numerical_Optimization_v2

This algorithm uses an estimated inverse Hessian matrix to steer its search through variable space and determine the optimal direction.

The line search algorithm terminates with a step length that satisfies the strong Wolfe conditions.

This optimizer only support one parameter group.

Parameters
  • params (iterable) – iterable of parameters to optimize or dicts defining parameter groups

  • lr (float, optional) – learning rate (default: 1e-3)

  • max_iter (int,optional) – max iteration per step (default: 20)

  • max_eval (int,optional) – max func evals per step (default: max_iter * 1.25)

  • tolerance_grad (float, optional) – termination tolerance on first order optimality (default 1e-7)

  • tolerance_change (float, optional) – termination tolerance on paramter changes (default: 1e-9)

  • history_size (int,optional) – paramter update history size (default: 100)

  • line_search_fn (str,optional) – line search function strong_wolfe or None (default: None)

  • contiguous_params (bool, optional) – whether to use contiguous ParamGroup which puts all parameters of the same type, device and group into the same tensor and update them together. (default: False)

For example:

# Assume net is a custom model.
lbfgs = flow.optim.LBFGS(net.parameters())

for epoch in range (epochs):
    def closure():
        lbfgs.zero_grad()
        # Read data, Compute the loss and so on.
        loss.backward()
        return loss
    lbfgs.step(closure)
__init__(params: Union[Iterator[oneflow.nn.Parameter], List[Dict]], lr: float = 0.001, max_iter: int = 20, max_eval: Optional[int] = None, tolerance_grad: float = 1e-07, tolerance_change: float = 1e-09, history_size: int = 100, line_search_fn=None, contiguous_params: bool = False)

Initialize self. See help(type(self)) for accurate signature.

Methods

__delattr__(name, /)

Implement delattr(self, name).

__dir__()

Default dir() implementation.

__eq__(value, /)

Return self==value.

__format__(format_spec, /)

Default object formatter.

__ge__(value, /)

Return self>=value.

__getattribute__(name, /)

Return getattr(self, name).

__gt__(value, /)

Return self>value.

__hash__()

Return hash(self).

__init__(params[, lr, max_iter, max_eval, …])

Initialize self.

__init_subclass__

This method is called when a class is subclassed.

__le__(value, /)

Return self<=value.

__lt__(value, /)

Return self<value.

__ne__(value, /)

Return self!=value.

__new__(**kwargs)

Create and return a new object.

__reduce__()

Helper for pickle.

__reduce_ex__(protocol, /)

Helper for pickle.

__repr__()

Return repr(self).

__setattr__(name, value, /)

Implement setattr(self, name, value).

__sizeof__()

Size of object in memory, in bytes.

__str__()

Return str(self).

__subclasshook__

Abstract classes can override this to customize issubclass().

_check_variables_in_graph(vars_conf)

_check_variables_optimizer_bound(vars_conf)

_gather_flat_grad()

_generate_grad_clip_conf_for_optim_conf(…)

_generate_indexed_slices_optimizer_conf(…)

_generate_lr_scale_for_optim_conf(…)

_numel()

_parse_input_parameters(parameters)

Supports such parameters:

_try_direction(closure, x, t, d)

_update(step_size, direction)

add_param_group(param_group)

Add a param group to the Optimizer s param_groups.

clip_grad([error_if_nonfinite])

Clips gradient norm of an iterable of parameters.

load_state_dict(state_dict)

Load the state of the optimizer which is created by state_dict function.

state_dict()

Returns the state of the optimizer as a dict.

step([closure])

Performs a single optimization step.

zero_grad([set_to_none])

Sets the gradients of all optimized oneflow.Tensor s to zero.

Attributes

support_sparse

Whether the Optimizer support sparse update.