oneflow.autograd¶
Functions and classes for autograd.¶
-
class
oneflow.autograd.
Function
(self)¶ Base class to create custom autograd.Function.
To create a custom autograd.Function, subclass this class and implement the
forward()
andbackward()
static methods. Then, to use your custom op in the forward pass, call the class methodapply()
or__call__()
. Do not callforward()
directly.For example:
class Exp(Function): @staticmethod def forward(ctx, i): result = i.exp() ctx.save_for_backward(result) return result @staticmethod def backward(ctx, grad_output): result, = ctx.saved_tensors return grad_output * result # Use it by calling the apply method or __call__ method output = Exp.apply(input) # output = Exp()(input)
-
__call__
(*inputs)¶ See
self.apply()
.
-
classmethod
apply
(*inputs)¶ Calculate output tensors and build backward graph.
-
Copyright 2020 The OneFlow Authors. All rights reserved.
Licensed under the Apache License, Version 2.0 (the “License”); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an “AS IS” BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
-
oneflow.autograd.
backward
(tensors: Union[oneflow.Tensor, Sequence[oneflow.Tensor]], grad_tensors: Optional[Union[oneflow.Tensor, Sequence[oneflow.Tensor]]], retain_graph: bool = False, create_graph: bool = False) → None¶ The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.autograd.backward.html.
Computes the sum of gradients of given tensors with respect to graph leaves.
The graph is differentiated using the chain rule. If any of
tensors
are non-scalar (i.e. their data has more than one element) and require gradient, then the Jacobian-vector product would be computed, in this case the function additionally requires specifyinggrad_tensors
. It should be a sequence of matching length, that contains the “vector” in the Jacobian-vector product, usually the gradient of the differentiated function w.r.t. corresponding tensors. (None
is an acceptable value for all tensors that don’t need gradient.)This function accumulates gradients in the leaves - you might need to zero
.grad
attributes or set them toNone
before calling it.Note
Using this method with
create_graph=True
will create a reference cycle between the parameter and its gradient which can cause a memory leak. We recommend usingautograd.grad
when creating the graph to avoid this. If you have to use this function, make sure to reset the.grad
fields of your parameters toNone
after use to break the cycle and avoid the leak.- Parameters
tensors (Tensor or Sequence[Tensor]) – Tensors of which the derivative will be computed.
grad_tensors (Tensor or Sequence[Tensor], optional) – The “vector” in the Jacobian-vector product, usually gradients each element of corresponding tensors. (None values can be specified for scalar Tensors or ones that don’t require grad.)
retain_graph (bool, optional) – If
False
, the graph used to compute the grads will be reset after backward is complete. Defaults toFalse
. Note that in nearly all cases setting this option toTrue
is not needed and often can be worked around in a much more efficient way. Defaults to the value ofcreate_graph
.create_graph (bool, optional) – If
True
, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults toFalse
.
-
oneflow.autograd.
grad
(outputs: Union[oneflow.Tensor, Sequence[oneflow.Tensor]], inputs: Union[oneflow.Tensor, Sequence[oneflow.Tensor]], grad_outputs: Optional[Union[oneflow.Tensor, Sequence[oneflow.Tensor]]] = None, retain_graph: bool = False, create_graph: bool = False) → Tuple[oneflow.Tensor]¶ The documentation is referenced from: https://pytorch.org/docs/1.10/generated/torch.autograd.grad.html.
Computes and returns the sum of gradients of outputs with respect to the inputs.
The graph is differentiated using the chain rule.
grad_outputs
should be a sequence of length matchingoutputs
, containing the “vector” in the Jacobian-vector product. (None
is an acceptable value for that tensor don’t require gradient.)- Parameters
outputs (Sequence[Tensor]) – Tensors of which the derivative will be computed.
inputs (Sequence[Tensor]) – Inputs w.r.t. which the derivative will be returned(and not accumulated into
.grad
).grad_outputs (Sequence[Tensor], optional) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. Defaults to None.
retain_graph (bool, optional) – If
False
, the graph used to compute the grads will be reset after backward is complete. Defaults toFalse
. Note that in nearly all cases setting this option toTrue
is not needed and often can be worked around in a much more efficient way. Defaults to the value ofcreate_graph
.create_graph (bool, optional) – If
True
, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults toFalse
.
- Returns
A tuple of tensors containing the gradients for each
inputs
.- Return type
Tuple(Tensor)