Megablocks

Latest version: v0.5.1

Safety actively analyzes 625095 Python packages for vulnerabilities to keep your Python projects secure.

0.5.1

What's Changed
* Update dependencies and package organization. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/52
* Remove errant "*" in README by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/54
* Update Megatron-LM scripts and integration for latest Docker container. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/55
* Update setup.py to support multiple device capabilities by simon-mo in https://github.com/stanford-futuredata/megablocks/pull/56
* enable arg enabled normalization of routing weights by vchiley in https://github.com/stanford-futuredata/megablocks/pull/58
* More customizable norm for expert weights by snarayan21 in https://github.com/stanford-futuredata/megablocks/pull/60
* Update README.md by eltociear in https://github.com/stanford-futuredata/megablocks/pull/63
* enable custom activation functions by vchiley in https://github.com/stanford-futuredata/megablocks/pull/65
* Skip updating load balancing loss on eval by sedrick-keh-tri in https://github.com/stanford-futuredata/megablocks/pull/69
* Change router weight norm from in-place by sashaDoubov in https://github.com/stanford-futuredata/megablocks/pull/70
* add mem optimized grouped glu by vchiley in https://github.com/stanford-futuredata/megablocks/pull/66
* Add cast to tensor for DTensor inputs for groupedmlp by eracah in https://github.com/stanford-futuredata/megablocks/pull/71
* Dtensor to all paths by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/73
* Refactor dtesnor by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/74
* Mem opt glu bkwd by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/72
* Add dmlp registry args by j316chuck in https://github.com/stanford-futuredata/megablocks/pull/75
* Fix default to be sparse by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/76
* Fix `moe_normalize_expert_weights` when `top_k=1` by 152334H in https://github.com/stanford-futuredata/megablocks/pull/87
* Updt triton pin by vchiley in https://github.com/stanford-futuredata/megablocks/pull/89

New Contributors
* simon-mo made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/56
* snarayan21 made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/60
* eltociear made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/63
* sedrick-keh-tri made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/69
* eracah made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/71
* j316chuck made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/75
* 152334H made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/87

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.5.0...v0.5.1

0.5.0

What's New

Several improvements to avoid CPU <> GPU device synchronizations, GLU support, and support for some new models 👀

What's Changed
* Update version by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/36
* Avoid duplicate `.cpu()` call by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/37
* Have megablocks rely on torch default precision by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/39
* Add GLU support by sashaDoubov in https://github.com/stanford-futuredata/megablocks/pull/38
* Enable generic dimentionality for input by vchiley in https://github.com/stanford-futuredata/megablocks/pull/41
* Removing an extra size call by bcui19 in https://github.com/stanford-futuredata/megablocks/pull/43
* Fix bug in topology kernel for ffn_hidden_size>4096. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/47

New Contributors
* sashaDoubov made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/38
* bcui19 made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/43

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.4.0...v0.5.0

0.4.0

What's Changed
* Unpack saved context once by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/33
* Refactoring class hierarchy for FSDP wrapping by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/34

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.3.3...v0.4.0

0.3.3

What's Changed
* Enable running MegaBlocks MoE without bias by vchiley in https://github.com/stanford-futuredata/megablocks/pull/31

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.3.2...v0.3.3

0.3.2

What's Changed

- Support for bfloat16
- Optimizations for top_k > 1
- Support for fully-sharded data parallelism
- Support tensor model parallelism when expert_parallel_world_size > num_experts
- Optimizations for activation memory
- Support activation quantization (thanks dblalock!)
- Optimizations for SM90 (Hopper)
- Lots of bug fixes, cleanup and small optimizations

New Contributors
* vchiley made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/9
* deepakn94 made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/16
* b-chu made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/19

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.1...v0.3.2

0.1

Initial release documenting repository state prior to MLSys'23 camera-ready publication.

Releases

Has known vulnerabilities