oneflow.autograd.grad(outputs: Union[oneflow.Tensor, Sequence[oneflow.Tensor]], inputs: Union[oneflow.Tensor, Sequence[oneflow.Tensor]], grad_outputs: Optional[Union[oneflow.Tensor, Sequence[oneflow.Tensor]]] = None, retain_graph: bool = False, create_graph: bool = False, allow_unused: bool = False)Tuple[oneflow.Tensor]

Computes and returns the sum of gradients of outputs with respect to the inputs.

The documentation is referenced from:

The graph is differentiated using the chain rule. grad_outputs should be a sequence of length matching outputs, containing the “vector” in the Jacobian-vector product. (None is an acceptable value for that tensor don’t require gradient.)

  • outputs (Sequence[Tensor]) – Tensors of which the derivative will be computed.

  • inputs (Sequence[Tensor]) – Inputs w.r.t. which the derivative will be returned(and not accumulated into .grad).

  • grad_outputs (Sequence[Tensor], optional) – The “vector” in the Jacobian-vector product. Usually gradients w.r.t. each output. None values can be specified for scalar Tensors or ones that don’t require grad. Defaults to None.

  • retain_graph (bool, optional) – If False, the graph used to compute the grads will be reset after backward is complete. Defaults to False. Note that in nearly all cases setting this option to True is not needed and often can be worked around in a much more efficient way. Defaults to the value of create_graph.

  • create_graph (bool, optional) – If True, graph of the derivative will be constructed, allowing to compute higher order derivative products. Defaults to False.

  • allow_unused (bool, optional) – If False, specifying inputs that were not used when computing outputs (and therefore their grad is always zero) is an error. Defaults to False.


A tuple of tensors containing the gradients for each inputs.

Return type