Bug Fixes
- Make compress/decompress of BaguaTensor `method` string consistent (33)
- Fix scatter and reduce_scatter implementation (40)
- Substract overflow error for decentralized op (39)
- Autotune api conflict (131)
- Autotune pytest run forever (132)
- Fix bagua.distributed.run --is_output_autotune_log parsing (145)
- Fix QADAM params (17)
- Fix assert precision (18)
- Fix torch version check (150)
Features
- Add native low precision decentralized operator (26)
- Add low precision decentralized algorithm (103)
- Add (scatter, gather, scatter_reduce) and all inplace version communication primitives (37)
- Add all communication primitives such as send recv to communication module (128)
- Make full precision decentralized op stateless (126)
- Make full precision decentralized op stateless (36)
- Add communication_primitives example (12)
- Support duplicated parameters acorss different modules (147)
- Support nccl 2.10 ReduceOp.AVG (149)
- Support nccl 2.10 ncclAvg (45)
- Use nccl 2.10 avg op for all algorithms using averaging (46)
- Add opentelemetry to report tensor ready order (42)
- Add support for reporting tensor completion order (146)
- Add deterministic flag (15)