Megablocks

Latest version: v0.6.1

Safety actively analyzes 681844 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

2.4

**2. New CI/CD**

MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (131) and run tests on a GPU (127).

**3. Remove Weight Parallelism (137)**

Weight parallelism was not in use and so we removed it.

**4. Shared Experts (109)**
Implement shared experts, based on the DeepSeekMoE [paper](https://arxiv.org/abs/2401.06066).


Bug Fixes
1. Better handle incompatible ffn sizes (108)
2. Fix AMP for memory optimized options (111)
3. Don't save moe lb-loss tensors (119)

What's Changed
* Remove turbo by dblalock in https://github.com/databricks/megablocks/pull/96
* Update README.md by dakinggg in https://github.com/databricks/megablocks/pull/98
* Fix for `ffn_hidden_size` of 128, and better error message for incompatible ffn sizes. by snarayan21 in https://github.com/databricks/megablocks/pull/108
* Add Shared Expert by vchiley in https://github.com/databricks/megablocks/pull/109
* Fix AMP for memory optimized options by mvpatel2000 in https://github.com/databricks/megablocks/pull/111
* bump and pin versions by vchiley in https://github.com/databricks/megablocks/pull/112
* dont save moe lb-loss tensors if args.moe_loss_weight=0 by michael-go in https://github.com/databricks/megablocks/pull/119
* bump by vchiley in https://github.com/databricks/megablocks/pull/116
* Minor changes to batched_load_balancing_loss function by ShashankMosaicML in https://github.com/databricks/megablocks/pull/121
* Migrate tests to pytest + add GA by eitanturok in https://github.com/databricks/megablocks/pull/127
* Change Runner in GA by eitanturok in https://github.com/databricks/megablocks/pull/129
* Clean up setup.py by eitanturok in https://github.com/databricks/megablocks/pull/128
* only run GA if repo owner is Databricks by eitanturok in https://github.com/databricks/megablocks/pull/135
* GA to Lint + Format MegaBlocks by eitanturok in https://github.com/databricks/megablocks/pull/131
* bump ci-testing to v0.1.2 by eitanturok in https://github.com/databricks/megablocks/pull/138
* remove weight parallelism by eitanturok in https://github.com/databricks/megablocks/pull/137
* refactor testing by eitanturok in https://github.com/databricks/megablocks/pull/140
* Type Checking by eitanturok in https://github.com/databricks/megablocks/pull/141
* Bump torch to <2.4.1 by eitanturok in https://github.com/databricks/megablocks/pull/145

New Contributors
* dakinggg made their first contribution in https://github.com/databricks/megablocks/pull/98
* michael-go made their first contribution in https://github.com/databricks/megablocks/pull/119
* ShashankMosaicML made their first contribution in https://github.com/databricks/megablocks/pull/121

**Full Changelog**: https://github.com/databricks/megablocks/compare/v0.5.1...v0.6.0

0.6.1

What's New
Patch release to remove dependencies specified via github and instead use released versions through pypi (specifically, stanford-stk and grouped-gemm). This allows for releasing megablocks itself via pypi.


What's Changed
* Remove direct dependencies, allowing for megablocks pypi release by snarayan21 in https://github.com/databricks/megablocks/pull/149


**Full Changelog**: https://github.com/databricks/megablocks/compare/v0.6.0...v0.6.1

0.6.0

What's New

**1. Torch 2.4 Compatibility (145)**

0.5.1

What's Changed
* Update dependencies and package organization. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/52
* Remove errant "*" in README by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/54
* Update Megatron-LM scripts and integration for latest Docker container. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/55
* Update setup.py to support multiple device capabilities by simon-mo in https://github.com/stanford-futuredata/megablocks/pull/56
* enable arg enabled normalization of routing weights by vchiley in https://github.com/stanford-futuredata/megablocks/pull/58
* More customizable norm for expert weights by snarayan21 in https://github.com/stanford-futuredata/megablocks/pull/60
* Update README.md by eltociear in https://github.com/stanford-futuredata/megablocks/pull/63
* enable custom activation functions by vchiley in https://github.com/stanford-futuredata/megablocks/pull/65
* Skip updating load balancing loss on eval by sedrick-keh-tri in https://github.com/stanford-futuredata/megablocks/pull/69
* Change router weight norm from in-place by sashaDoubov in https://github.com/stanford-futuredata/megablocks/pull/70
* add mem optimized grouped glu by vchiley in https://github.com/stanford-futuredata/megablocks/pull/66
* Add cast to tensor for DTensor inputs for groupedmlp by eracah in https://github.com/stanford-futuredata/megablocks/pull/71
* Dtensor to all paths by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/73
* Refactor dtesnor by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/74
* Mem opt glu bkwd by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/72
* Add dmlp registry args by j316chuck in https://github.com/stanford-futuredata/megablocks/pull/75
* Fix default to be sparse by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/76
* Fix `moe_normalize_expert_weights` when `top_k=1` by 152334H in https://github.com/stanford-futuredata/megablocks/pull/87
* Updt triton pin by vchiley in https://github.com/stanford-futuredata/megablocks/pull/89

New Contributors
* simon-mo made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/56
* snarayan21 made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/60
* eltociear made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/63
* sedrick-keh-tri made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/69
* eracah made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/71
* j316chuck made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/75
* 152334H made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/87

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.5.0...v0.5.1

0.5.0

What's New

Several improvements to avoid CPU <> GPU device synchronizations, GLU support, and support for some new models 👀

What's Changed
* Update version by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/36
* Avoid duplicate `.cpu()` call by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/37
* Have megablocks rely on torch default precision by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/39
* Add GLU support by sashaDoubov in https://github.com/stanford-futuredata/megablocks/pull/38
* Enable generic dimentionality for input by vchiley in https://github.com/stanford-futuredata/megablocks/pull/41
* Removing an extra size call by bcui19 in https://github.com/stanford-futuredata/megablocks/pull/43
* Fix bug in topology kernel for ffn_hidden_size>4096. by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/47

New Contributors
* sashaDoubov made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/38
* bcui19 made their first contribution in https://github.com/stanford-futuredata/megablocks/pull/43

**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.4.0...v0.5.0

0.4.0

What's Changed
* Unpack saved context once by mvpatel2000 in https://github.com/stanford-futuredata/megablocks/pull/33
* Refactoring class hierarchy for FSDP wrapping by tgale96 in https://github.com/stanford-futuredata/megablocks/pull/34


**Full Changelog**: https://github.com/stanford-futuredata/megablocks/compare/v0.3.3...v0.4.0

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.