oneflow.one_embedding.make_cached_ssd_store_options¶

oneflow.one_embedding.make_cached_ssd_store_options(cache_budget_mb, persistent_path, capacity=None, size_factor=1, storage_dim=- 1, physical_block_size=4096, host_cache_budget_mb=0)¶

make SSD use GPU and host as cache store_options param of MultiTableEmbedding. If cache_budget_mb > 0 and host_cache_budget_mb > 0, use GPU and host memory as multi-level cache.

Parameters

cache_budget_mb (int) – the MB budget of per GPU as cache.
persistent_path (str, list) – persistent storage path of Embedding, must use fast SSD because of frequently random disk access during training. If passed a str, current rank Embedding will be saved in path/rank_id-num_ranks path. If passed a list, the list length must equals num_ranks, each elem of list represent the path of rank_id Embedding.
capacity (int) – total capacity of Embedding
size_factor (int, optional) – store size factor of embedding_dim, if SGD update, and momentum = 0, should be 1, if momentum > 0, it should be 2. if Adam, should be 3. Defaults to 1.
storage_dim (int, optional) – number of elements in embedding storage, if set storage_dim, the size_factor param will be invalid. if SGD update, and momentum = 0, storage_dim should be embedding_size*1, if momentum > 0, storage_dim should be embedding_size*2. if Adam, storage_dim should be embedding_size*3. Defaults to -1.
physical_block_size (int, optional) – physical_block_size should be sector size. Defaults to 4096.
host_cache_budget_mb (int) – the MB budget of host memory as cache per rank. Defaults to 0.

Returns

SSD use GPU and host as cache store_options param of MultiTableEmbedding

Return type

dict

For example:

>>> import oneflow as flow
>>> store_options = flow.one_embedding.make_cached_ssd_store_options(
>>>     cache_budget_mb=8192, persistent_path="/your_path_to_ssd", capacity=vocab_size,
>>> )
>>> # pass the store_options to the "store_options" param of flow.one_embedding.MultiTableEmbedding
>>> # ...