Auto Parallelism

As the scale of deep-learning models grows larger and larger, distributed training, or parallelism, is needed. Data parallelism and model parallelism has been designed to speed up the training and solve memory issues.

In oneflow, SBP signature enables users to configure parallelism policy easily. However, users still need to specify the SBP property for each operator, or most of them. Users might spend a couple of days digging into the detail of parallelism and get a low throughput just because of a slight mistake in the configuration of SBP signature.

Note

It only works on oneflow.nn.Graph mode.

Our strength

To get rid of all those configurations for SBP signatures, we developed auto parallelism. Still, configurations of placement are necessary and we have not supported auto placement yet. If you read this paragraph before you rush into any SBP stuff, then congratulation, you do not need to learn SBPs. You can start writing your code as you did under CPU mode. Our auto parallelism would generate a fast strategy customized for your specific models, the size of parameters, and the number of available GPUs.

How to use auto parallelism?

You just need to simply enable the configuration settings in the model of oneflow.nn.Graph .

Example:

import oneflow as flow
class SubclassGraph(flow.nn.Graph):
    def __init__(self):
        super().__init__() # MUST be called
        # auto parallelism configuration
        self.config.enable_auto_parallel(True)
        # other configurations about auto parallelism
        # ......

    def build(self):
        pass

Warning

If you enable auto parallelism, OneFlow will take care of the SBP configurations of operators except for explicit to_global functions.

Configuration API for auto parallelism

enable_auto_parallel

If true, then graph will use the auto parallel algorithm to select a parallelism strategy.

enable_auto_parallel_ignore_user_sbp_config

If true, it will ignore all user configurations of SBP.

set_auto_parallel_computation_cost_ratio

Set coefficient of computation cost in auto-parallel algorithm.

set_auto_parallel_wait_time

Set wait time for auto-parallel algorithm.

enable_auto_parallel_trunk_algo

Find the trunk of the SBP graph, then reduce the wait time for tributaries.

enable_auto_parallel_sbp_collector

Use “sbp collector” to create “sbp proxy” for nodes with multiple downstream operators.

enable_auto_memory

Whether we use a parallelism strategy with less memory