Auto Parallelism

As the scale of deep-learning models grows larger and larger, distributed training, or parallelism, is needed. Data parallelism and model parallelism has been designed to speed up the training and solve memory issues.

In oneflow, SBP signature enables users to configure parallelism policy easily. However, users still need to specify the SBP property for each operator, or most of them. Users might spend a couple of days digging into the detail of parallelism and get a low throughput just because of a slight mistake in the configuration of SBP signature.


It only works on oneflow.nn.Graph mode.

Our strength

To get rid of all those configurations for SBP signatures, we developed auto parallelism. Still, configurations of placement are necessary and we have not supported auto placement yet. If you read this paragraph before you rush into any SBP stuff, then congratulation, you do not need to learn SBPs. You can start writing your code as you did under CPU mode. Our auto parallelism would generate a fast strategy customized for your specific models, the size of parameters, and the number of available GPUs.

How to use auto parallelism?

You just need to simply enable the configuration settings in the model of oneflow.nn.Graph .


import oneflow as flow
class SubclassGraph(flow.nn.Graph):
    def __init__(self):
        super().__init__() # MUST be called
        # auto parallelism configuration
        # other configurations about auto parallelism
        # ......

    def build(self):


If you enable auto parallelism, OneFlow will take care of the SBP configurations of operators except for explicit to_global functions.

Configuration API for auto parallelism


If true, then graph will use the auto parallel algorithm to select a parallelism strategy.


If true, it will ignore all user configurations of SBP.


Set coefficient of computation cost in auto-parallel algorithm.


Set wait time for auto-parallel algorithm.


Find the trunk of the SBP graph, then reduce the wait time for tributaries.


Use “sbp collector” to create “sbp proxy” for nodes with multiple downstream operators.


Whether we use a parallelism strategy with less memory