oneflow.comm.all_reduce¶
-
oneflow.comm.
all_reduce
(tensor)¶ Reduces the tensor data across all machines in such a way that all get the final result. After the call
tensor
is going to be bitwise identical in all processes.- Parameters
tensor (Tensor) – the input tensor
For example:
>>> # We have 1 process groups, 2 ranks. >>> import oneflow as flow >>> tensor = flow.tensor([[1, 2], [3, 4]], device="cuda") + flow.env.get_local_rank() >>> # tensor on rank0 >>> tensor tensor([[1, 2], [3, 4]], device='cuda:0', dtype=oneflow.int64) >>> # tensor on rank1 >>> tensor tensor([[2, 3], [4, 5]], device='cuda:1', dtype=oneflow.int64) >>> flow.comm.all_reduce(tensor) >>> tensor.numpy() array([[3, 5], [7, 9]], dtype=int64)