Torchrec

Latest version: v1.0.0

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.7.0

0.7.0rc1

0.6.0

VBE
TorchRec now natively supports VBE (variable batched embeddings) within the `EmbeddingBagCollection` module. This allows variable batch size per feature, unlocking sparse input data deduplication, which can greatly speed up embedding lookup and all-to-all time. To enable, simply initialize `KeyedJaggedTensor `with `stride_per_key_per_rank` and `inverse_indices` fields, which specify batch size per feature and inverse indices to reindex the embedding output respectively.

Embedding offloading
Embedding offloading is UVM caching (i.e. storing embedding tables on host memory with cache on HBM memory) plus prefetching and optimal sizing of cache. Embedding offloading would allow running a larger model with fewer GPUs, while maintaining competitive performance. To use, one needs to use the prefetching pipeline ([PrefetchTrainPipelineSparseDist](https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/train_pipeline.py?#L1056)) and pass in [per table cache load factor](https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L457) and the [prefetch_pipeline](https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/types.py#L460) flag through constraints in the planner.

Trec.shard/shard_modules
These APIs replace embedding submodules with its sharded variant. The shard API applies to an individual embedding module while the shard_modules API replaces all embedding modules and won’t touch other non-embedding submodules.
Embedding sharding follows similar behavior to the prior TorchRec DistributedModuleParallel behavior, except the ShardedModules have been made composable, meaning the modules are backed by[ TableBatchedEmbeddingSlices](https://github.com/pytorch/torchrec/blob/main/torchrec/distributed/composable/table_batched_embedding_slice.py#L15) which are views into the underlying TBE (including .grad). This means that fused parameters are now returned with named_parameters(), including in DistributedModuleParallel.

0.6.0rc2

0.6.0rc1

This should support python 3.8 - 3.11 and 3.12 (experimental)

pip install torchrec --index-url https://download.pytorch.org/whl/test/cpu
pip install torchrec --index-url https://download.pytorch.org/whl/test/cu118
pip install torchrec --index-url https://download.pytorch.org/whl/test/cu121

0.5.0

[Prototype] Zero Collision / Managed Collision Embedding Bags

A common constraint in Recommender Systems is the sparse id input range is larger than the number of embeddings the model can learn for a given parameter size. To resolve this issue, the conventional solution is to hash sparse ids into the same size range as the embedding table. This will ultimately lead to hash collisions, with multiple sparse ids sharing the same embedding space. We have developed a performant alternative algorithm that attempts to address this problem by tracking the N most common sparse ids and ensuring that they have a unique embedding representation. The module is defined [here](https://github.com/pytorch/torchrec/blob/b992eebd80e8ccfc3b96a7fd39cb072c17e8907d/torchrec/modules/mc_embedding_modules.py#L26) and an example can be found [here](https://github.com/pytorch/torchrec/blob/b992eebd80e8ccfc3b96a7fd39cb072c17e8907d/torchrec/modules/mc_embedding_modules.py#L26).

[Prototype] UVM Caching - Prefetch Training Pipeline

For tables where on-device memory is insufficient to hold the entire embedding table, it is common to leverage a caching architecture where part of the embedding table is cached on device and the full embedding table is on host memory (typically DDR SDRAM). However, in practice, caching misses are common, and hurt performance due to relatively high latency of going to host memory. Building on TorchRec’s existing data pipelining, we developed a new [_Prefetch Training Pipeline_](https://pytorch.org/torchrec/torchrec.distributed.html#torchrec.distributed.train_pipeline.PrefetchPipelinedForward) to avoid these cache misses by prefetching the relevant embeddings for upcoming batch from host memory, effectively eliminating cache misses in the forward path.

Page 2 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.