oneflow.comm.all_reduce

oneflow.comm.all_reduce(tensor)

Reduces the tensor data across all machines in such a way that all get the final result. After the call tensor is going to be bitwise identical in all processes.

Parameters

tensor (Tensor) – the input tensor

For example:

>>> # We have 1 process groups, 2 ranks.
>>> import oneflow as flow

>>> tensor = flow.tensor([[1, 2], [3, 4]], device="cuda") + flow.env.get_local_rank()
>>> # tensor on rank0
>>> tensor 
tensor([[1, 2],
        [3, 4]], device='cuda:0', dtype=oneflow.int64)
>>> # tensor on rank1
>>> tensor 
tensor([[2, 3],
        [4, 5]], device='cuda:1', dtype=oneflow.int64)
>>> flow.comm.all_reduce(tensor)
>>> tensor.numpy()
array([[3, 5],
       [7, 9]], dtype=int64)