Bug Fixes
Ci
- Only run publish once on git tag
Core
- Fix compressed buffer can not be scattered to odd number of ranks
Other
- Fix ci pypi versioning
- Remove __init__.py and python __version__, use cargo version
- Move import bagua_install_library to install library function
- Merge bagua_install_library and setup.py, remove nccl<=2.6 support
- Fix alltoall_v parameter (17)
- Reduce and allgather python interface
- Fix decompress incorrect pointer and typo in error msg
- Fix python gil deadlock during getting data ptr
- Fix benchmark script requirements
- Fix alltoall_v parameter types (27)
- Always mark bagua padding tensor as ready
- Make compress/decompress of BaguaTensor `method` string consistent (33)
- Fix scatter and reduce_scatter implementation (40)
- Substract overflow error for decentralized op (39)
- Fix QADAM params (17)
- Fix assert precision (18)
- Replace mutex with atomic bool for async op and add Aluminum submodule update (67)
- Fix duplicated dependency downloading during installation (77)
- Fix async algorithm aborting and hanging (78, 81)
- Fix qadam algorithm call (20)
- Fix missing symbols in the zip library (24)
- Fix random autotune server hang (206)
- Bagua-net library path mismatch, make `--enable_bagua_net` argument style consistent with other args (218)
Python
- Fix random autotune-service hang
- Handle conflicts caused by sklearn upgrade (225)
Features
Ci
- Only publish pypi for master commits
Other
- Add async model average algorithm (110)
- Add cached dataset wrapper (148)
- Support sync batchnorm (151)
- Add `--enable-bagua-net` option in launcher (183)
- Add pytorch examples for MNIST, ImageNet, SQuAD training (1)
- Add requirements.txt, only download dataset on local rank 0 (2)
- Add python packaging related files
- Add `__version__` variable
- Install nccl deps in bagua core and add generated `__version__` variable
- Add version.py placeholder to prevent file not found error
- Initial support for python op (2)
- Add 5 min timeout for buckets' comm op (5)
- Replace NCCL with Aluminum (7)
- Add synethetic benchmark script (5)
- Add elastic training example (7)
- Support alltoall_v (vector alltoall) (14)
- Add reduce and allgather python interface
- Support reduce and allgather op with Reduction op enum
- Support creating BaguaTensor by passing torch tensor directly (19)
- Compatible mode for getting pytorch tensor info with Python interpreter
- Better debug log including tensor info when executing ops
- Add native low precision decentralized operator (26)
- Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (37)
- Make full precision decentralized op stateless (36)
- Add communication_primitives example (12)
- Use nccl 2.10 avg op for all algorithms using averaging (46, 45)
- Add opentelemetry to report tensor ready order (42)
- Add deterministic flag (15)
- Add native async model average algorithm (41)
- Add examples for async model average algorithm (14)
- Support packet splitting and multi-stream parallel transmission (5)
- Support ncclnet v3 and remove the dependency on nccl in the installation environment (17)
- Add sync interval param to async examples (19)
- Suppport tokio backend (21)
- Support bagua-net (89)
Python
- Broadcast scalars for optimizers (202)