Megablocks

Latest version: v0.8.0

Safety actively analyzes 723685 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

2.4

**2. New CI/CD**

MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (131) and run tests on a GPU (127).

**3. Remove Weight Parallelism (137)**

Weight parallelism was not in use and so we removed it.

**4. Shared Experts (109)**
Implement shared experts, based on the DeepSeekMoE [paper](https://arxiv.org/abs/2401.06066).

Bug Fixes
1. Better handle incompatible ffn sizes (108)
2. Fix AMP for memory optimized options (111)
3. Don't save moe lb-loss tensors (119)

What's Changed
* Remove turbo by dblalock in https://github.com/databricks/megablocks/pull/96
* Update README.md by dakinggg in https://github.com/databricks/megablocks/pull/98
* Fix for `ffn_hidden_size` of 128, and better error message for incompatible ffn sizes. by snarayan21 in https://github.com/databricks/megablocks/pull/108
* Add Shared Expert by vchiley in https://github.com/databricks/megablocks/pull/109
* Fix AMP for memory optimized options by mvpatel2000 in https://github.com/databricks/megablocks/pull/111
* bump and pin versions by vchiley in https://github.com/databricks/megablocks/pull/112
* dont save moe lb-loss tensors if args.moe_loss_weight=0 by michael-go in https://github.com/databricks/megablocks/pull/119
* bump by vchiley in https://github.com/databricks/megablocks/pull/116
* Minor changes to batched_load_balancing_loss function by ShashankMosaicML in https://github.com/databricks/megablocks/pull/121
* Migrate tests to pytest + add GA by eitanturok in https://github.com/databricks/megablocks/pull/127
* Change Runner in GA by eitanturok in https://github.com/databricks/megablocks/pull/129
* Clean up setup.py by eitanturok in https://github.com/databricks/megablocks/pull/128
* only run GA if repo owner is Databricks by eitanturok in https://github.com/databricks/megablocks/pull/135
* GA to Lint + Format MegaBlocks by eitanturok in https://github.com/databricks/megablocks/pull/131
* bump ci-testing to v0.1.2 by eitanturok in https://github.com/databricks/megablocks/pull/138
* remove weight parallelism by eitanturok in https://github.com/databricks/megablocks/pull/137
* refactor testing by eitanturok in https://github.com/databricks/megablocks/pull/140
* Type Checking by eitanturok in https://github.com/databricks/megablocks/pull/141
* Bump torch to <2.4.1 by eitanturok in https://github.com/databricks/megablocks/pull/145

New Contributors
* dakinggg made their first contribution in https://github.com/databricks/megablocks/pull/98
* michael-go made their first contribution in https://github.com/databricks/megablocks/pull/119
* ShashankMosaicML made their first contribution in https://github.com/databricks/megablocks/pull/121

**Full Changelog**: https://github.com/databricks/megablocks/compare/v0.5.1...v0.6.0

0.8.0

Breaking Changes
As a consequence of the `torch 2.6.0` upgrade, `sparse` support is disabled for `megablocks` (meaning that only `grouped` support is available).

For additional context, `torch 2.6.0` depends on `triton 3.2.0`, which introduced some change to how it handles `dtype` promotion when two binary operands have different `dtypes`, and as a result we're encountering an int16 overflow in the `stk` dependency of `megablocks` which results in an illegal memory access (IMA). Once this issue is resolved, we will release a new version of `megablocks`. View https://github.com/databricks/megablocks/pull/168 for additional details.

What's Changed
* Updated pytorch and disabled sparse tests by rithwik-db in https://github.com/databricks/megablocks/pull/168

New Contributors
* rithwik-db made their first contribution in https://github.com/databricks/megablocks/pull/168

**Full Changelog**: https://github.com/databricks/megablocks/compare/v0.7.0...v0.8.0

0.7.0

What's Changed
* Bump `_version.py` to 0.7.0.dev0 by eitanturok in https://github.com/databricks/megablocks/pull/148
* Remove deprecated torch.cuda.amp custom fwd and bwd by snarayan21 in https://github.com/databricks/megablocks/pull/150
* Implement Router Z-loss by josejg in https://github.com/databricks/megablocks/pull/151
* Initialize default device lazily by janEbert in https://github.com/databricks/megablocks/pull/152
* Update router lint by mihir-db in https://github.com/databricks/megablocks/pull/158
* Bump torch 2.5.1 and upgrade to 0.8.0.dev0 by j316chuck in https://github.com/databricks/megablocks/pull/162

New Contributors
* josejg made their first contribution in https://github.com/databricks/megablocks/pull/151
* janEbert made their first contribution in https://github.com/databricks/megablocks/pull/152
* mihir-db made their first contribution in https://github.com/databricks/megablocks/pull/158

**Full Changelog**: https://github.com/databricks/megablocks/compare/v0.6.1...v0.7.0

0.6.1

What's New
Patch release to remove dependencies specified via github and instead use released versions through pypi (specifically, stanford-stk and grouped-gemm). This allows for releasing megablocks itself via pypi.

What's Changed
* Remove direct dependencies, allowing for megablocks pypi release by snarayan21 in https://github.com/databricks/megablocks/pull/149

**Full Changelog**: https://github.com/databricks/megablocks/compare/v0.6.0...v0.6.1

0.6.0

What's New

**1. Torch 2.4 Compatibility (145)**

0.5.1

What's Changed
* Update dependencies and package organization. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/52
* Remove errant "*" in README by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/54
* Update Megatron-LM scripts and integration for latest Docker container. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/55
* Update setup.py to support multiple device capabilities by simon-mo in https://github.com/stanford-futuredata/megablocks/pull/56
* enable arg enabled normalization of routing weights by vchiley in https://github.com/stanford-futuredata/megablocks/pull/58
* More customizable norm for expert weights by snarayan21 in https://github.com/stanford-futuredata/megablocks/pull/60
* Update README.md by eltociear in https://github.com/stanford-futuredata/megablocks/pull/63
* enable custom activation functions by vchiley in https://github.com/stanford-futuredata/megablocks/pull/65
* Skip updating load balancing loss on eval by sedrick-keh-tri in https://github.com/stanford-futuredata/megablocks/pull/69
* Change router weight norm from in-place by sashaDoubov in https://github.com/stanford-futuredata/megablocks/pull/70
* add mem optimized grouped glu by vchiley in https://github.com/stanford-futuredata/megablocks/pull/66
* Add cast to tensor for DTensor inputs for groupedmlp by eracah in https://github.com/stanford-futuredata/megablocks/pull/71
* Dtensor to all paths by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/73
* Refactor dtesnor by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/74
* Mem opt glu bkwd by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/72
* Add dmlp registry args by j316chuck in https://github.com/stanford-futuredata/megablocks/pull/75
* Fix default to be sparse by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/76
* Fix `moe_normalize_expert_weights` when `top_k=1` by 152334H in https://github.com/stanford-futuredata/megablocks/pull/87
* Updt triton pin by vchiley in https://github.com/stanford-futuredata/megablocks/pull/89

New Contributors
* simon-mo made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/56
* snarayan21 made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/60
* eltociear made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/63
* sedrick-keh-tri made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/69
* eracah made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/71
* j316chuck made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/75
* 152334H made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/87

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.5.0...v0.5.1

Page 1 of 2

Releases

Has known vulnerabilities

Megablocks

Page 1 of 2

2.4

0.8.0

0.7.0

0.6.1

0.6.0

0.5.1

Page 1 of 2

Links

Releases