oneflow.one_embedding =================================== Embedding is an important component of recommender system, and it has also spread to many fields outside recommender systems. Each framework provides basic operators for Embedding, for example, ``flow.nn.Embedding`` in OneFlow: :: import numpy as np import oneflow as flow indices = flow.tensor([[1, 2, 4, 5], [4, 3, 2, 9]], dtype=flow.int) embedding = flow.nn.Embedding(10, 3) y = embedding(indices) OneEmbedding is the large-scale Embedding solution that OneFlow provides to solve the problem of large-scale deep recommender systems. OneEmbedding has the following advantages compared to ordinary opeartors: - With Flexible hierarchical storage, OneEmbedding can place the Embedding table on GPU memory, CPU memory or SSD, and allow high-speed devices to be used as caches for low-speed devices to achieve both speed and capacity. - OneEmbedding supports dynamic expansion. .. note :: Please refer to `Large-Scale Embedding Solution: OneEmbedding `__ for a brief introduction to all features related to OneEmbedding. Configure Embedding Table ---------------------------------- OneEmbedding supports simultaneous creation of multiple Embedding table. The following codes configured three Embedding tables. .. code-block:: import oneflow as flow import oneflow.nn as nn import numpy as np tables = [ flow.one_embedding.make_table_options( flow.one_embedding.make_uniform_initializer(low=-0.1, high=0.1) ), flow.one_embedding.make_table_options( flow.one_embedding.make_uniform_initializer(low=-0.05, high=0.05) ), flow.one_embedding.make_table_options( flow.one_embedding.make_uniform_initializer(low=-0.15, high=0.15) ), ] When configuring the Embedding table, you need to specify the initialization method. The above Embedding tables are initialized in the ``uniform`` method. The result of configuring the Embedding table is stored in the ``tables`` variable .. autofunction:: oneflow.one_embedding.make_table_options .. autofunction:: oneflow.one_embedding.make_table initialization method ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. currentmodule:: oneflow.one_embedding .. autosummary:: :toctree: generated :nosignatures: make_uniform_initializer make_normal_initializer Configure the Storage Attribute of the Embedding Table -------------------------------------------------------------------- Then run the following codes to configure the storage attribute of the Embedding table: .. code-block:: store_options = flow.one_embedding.make_cached_ssd_store_options( cache_budget_mb=8142, persistent_path="/your_path_to_ssd", capacity=40000000, size_factor=1, physical_block_size=512 ) Storage Method ^^^^^^^^^^^^^^^^^^^^ .. currentmodule:: oneflow.one_embedding .. autosummary:: :toctree: generated :nosignatures: make_device_mem_store_options make_cached_ssd_store_options make_cached_host_mem_store_options .. note :: Please refer to `Large-Scale Embedding Solution: OneEmbedding `__ for a brief introduction to learn about How to Choose the Proper Storage Configuration Instantiate Embedding -------------------------------------------------------------------- After the above configuration is completed, you can use MultiTableEmbedding to get the instantiated Embedding layer. .. code-block:: embedding_size = 128 embedding = flow.one_embedding.MultiTableEmbedding( name="my_embedding", embedding_dim=embedding_size, dtype=flow.float, key_type=flow.int64, tables=tables, store_options=store_options, ) embedding.to("cuda") .. note :: Please refer to `Large-Scale Embedding Solution: OneEmbedding `__ for a brief introduction to learn about Feature ID and Multi-Table Query. MultiTableEmbedding ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autofunction:: oneflow.one_embedding.MultiTableEmbedding .. currentmodule:: oneflow.one_embedding.MultiTableEmbedding .. autosummary:: :toctree: generated :nosignatures: forward save_snapshot load_snapshot MultiTableMultiColumnEmbedding ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ .. autofunction:: oneflow.one_embedding.MultiTableMultiColumnEmbedding .. currentmodule:: oneflow.one_embedding.MultiTableMultiColumnEmbedding .. autosummary:: :toctree: generated :nosignatures: forward save_snapshot load_snapshot Construct Graph for Training -------------------------------------------------------------------- OneEmbedding is only supported in Graph mode. .. code-block:: num_tables = 3 mlp = flow.nn.FusedMLP( in_features=embedding_size * num_tables, hidden_features=[512, 256, 128], out_features=1, skip_final_activation=True, ) mlp.to("cuda") class TrainGraph(flow.nn.Graph): def __init__(self,): super().__init__() self.embedding_lookup = embedding self.mlp = mlp self.add_optimizer( flow.optim.SGD(self.embedding_lookup.parameters(), lr=0.1, momentum=0.0) ) self.add_optimizer( flow.optim.SGD(self.mlp.parameters(), lr=0.1, momentum=0.0) ) def build(self, ids): embedding = self.embedding_lookup(ids) loss = self.mlp(flow.reshape(embedding, (-1, num_tables * embedding_size))) loss = loss.sum() loss.backward() return loss .. note :: Please refer to `Distributed Training: OneEmbedding `__ for a brief introduction to learn about Graph For Training Persistent Read & Write ----------------------------------------------- .. currentmodule:: oneflow.one_embedding .. autosummary:: :toctree: generated :nosignatures: make_persistent_table_reader make_persistent_table_writer .. automodule:: oneflow.one_embedding :members: Ftrl