Mmpretrain

Latest version: v1.2.0

Safety actively analyzes 624665 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 7

2.0.0

![image](https://github.com/open-mmlab/mmpretrain/assets/36138628/2fbb2ede-b226-4679-86f3-14913f600a2a)
Remark: Both FSDP and DeepSpeed were tested with default configurations and not tuned, besides manually tuning the FSDP wrap policy can further reduce training time and memory usage.

New Features

- Transfer shape-bias tool from mmselfsup ([1658](https://github.com/open-mmlab/mmpretrain/pull/1685))
- Download dataset by using MIM&OpenDataLab ([1630](https://github.com/open-mmlab/mmpretrain/pull/1630))
- Support New Configs ([1639](https://github.com/open-mmlab/mmpretrain/pull/1639), [#1647](https://github.com/open-mmlab/mmpretrain/pull/1647), [#1665](https://github.com/open-mmlab/mmpretrain/pull/1665))
- Support Flickr30k Retrieval dataset ([1625](https://github.com/open-mmlab/mmpretrain/pull/1625))
- Support SparK ([1531](https://github.com/open-mmlab/mmpretrain/pull/1531))
- Support LLaVA ([1652](https://github.com/open-mmlab/mmpretrain/pull/1652))
- Support Otter ([1651](https://github.com/open-mmlab/mmpretrain/pull/1651))
- Support MiniGPT-4 ([1642](https://github.com/open-mmlab/mmpretrain/pull/1642))
- Add support for VizWiz dataset ([1636](https://github.com/open-mmlab/mmpretrain/pull/1636))
- Add support for vsr dataset ([1634](https://github.com/open-mmlab/mmpretrain/pull/1634))
- Add InternImage Classification project ([1569](https://github.com/open-mmlab/mmpretrain/pull/1569))
- Support OCR-VQA dataset ([1621](https://github.com/open-mmlab/mmpretrain/pull/1621))
- Support OK-VQA dataset ([1615](https://github.com/open-mmlab/mmpretrain/pull/1615))
- Support TextVQA dataset ([1569](https://github.com/open-mmlab/mmpretrain/pull/1569))
- Support iTPN and HiViT ([1584](https://github.com/open-mmlab/mmpretrain/pull/1584))
- Add retrieval mAP metric ([1552](https://github.com/open-mmlab/mmpretrain/pull/1552))
- Support NoCap dataset based on BLIP. ([1582](https://github.com/open-mmlab/mmpretrain/pull/1582))
- Add GQA dataset ([1585](https://github.com/open-mmlab/mmpretrain/pull/1585))

Improvements

- Update fsdp vit-huge and vit-large config ([1675](https://github.com/open-mmlab/mmpretrain/pull/1675))
- Support deepspeed with flexible runner ([1673](https://github.com/open-mmlab/mmpretrain/pull/1673))
- Update Otter and LLaVA docs and config. ([1653](https://github.com/open-mmlab/mmpretrain/pull/1653))
- Add image_only param of ScienceQA ([1613](https://github.com/open-mmlab/mmpretrain/pull/1613))
- Support to use "split" to specify training set/validation ([1535](https://github.com/open-mmlab/mmpretrain/pull/1535))

Bug Fixes

- Refactor \_prepare_pos_embed in ViT ([1656](https://github.com/open-mmlab/mmpretrain/pull/1656)， [#1679](https://github.com/open-mmlab/mmpretrain/pull/1679))
- Freeze pre norm in vision transformer ([1672](https://github.com/open-mmlab/mmpretrain/pull/1672))
- Fix bug loading IN1k dataset ([1641](https://github.com/open-mmlab/mmpretrain/pull/1641))
- Fix sam bug ([1633](https://github.com/open-mmlab/mmpretrain/pull/1633))
- Fixed circular import error for new transform ([1609](https://github.com/open-mmlab/mmpretrain/pull/1609))
- Update torchvision transform wrapper ([1595](https://github.com/open-mmlab/mmpretrain/pull/1595))
- Set default out_type in CAM visualization ([1586](https://github.com/open-mmlab/mmpretrain/pull/1586))

Docs Update

- Fix spelling ([1681](https://github.com/open-mmlab/mmpretrain/pull/1681))
- Fix doc typos ([1671](https://github.com/open-mmlab/mmpretrain/pull/1671), [#1644](https://github.com/open-mmlab/mmpretrain/pull/1644), [#1629](https://github.com/open-mmlab/mmpretrain/pull/1629))
- Add t-SNE visualization doc ([1555](https://github.com/open-mmlab/mmpretrain/pull/1555))

New Contributors
* alexwangxiang made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1555
* InvincibleWyq made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1615
* yyk-wew made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1634
* fanqiNO1 made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1673
* Ben-Louis made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1679
* Lamply made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1671
* minato-ellie made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1644
* liweiwp made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1629

1.0.1

Fix some bugs and enhance the codebase.

What's Changed
* [Fix] Fix Wrong-paramer Bug of RandomCrop by Ezra-Yu in https://github.com/open-mmlab/mmpretrain/pull/1706
* [Refactor] BEiT refactor by fanqiNO1 in https://github.com/open-mmlab/mmpretrain/pull/1705
* [Refactor] Fix spelling by fanqiNO1 in https://github.com/open-mmlab/mmpretrain/pull/1689
* [Fix] Freezing of cls_token in VisionTransformer by fabien-merceron in https://github.com/open-mmlab/mmpretrain/pull/1693
* [Fix] Typo fix of 'target' in vis_cam.py by bryanbocao in https://github.com/open-mmlab/mmpretrain/pull/1655
* [Feature] Support LoRA by fanqiNO1 in https://github.com/open-mmlab/mmpretrain/pull/1687
* [Fix] Fix the issue 1711 "GaussianBlur doesn't work" by liyunlong10 in https://github.com/open-mmlab/mmpretrain/pull/1722
* [Enhance] Add GPU Acceleration Apple silicon mac by NripeshN in https://github.com/open-mmlab/mmpretrain/pull/1699
* [Enhance] Adapt test cases on Ascend NPU. by Ginray in https://github.com/open-mmlab/mmpretrain/pull/1728
* [Enhance] Nested predict by marouaneamz in https://github.com/open-mmlab/mmpretrain/pull/1716
* [Enhance] Set 'is_init' in some multimodal methods by fangyixiao18 in https://github.com/open-mmlab/mmpretrain/pull/1718
* [Enhance] Add init_cfg with type='pretrained' to downstream tasks by fangyixiao18 in https://github.com/open-mmlab/mmpretrain/pull/1717
* [Fix] Fix dict update in minigpt4 by fangyixiao18 in https://github.com/open-mmlab/mmpretrain/pull/1709
* Bump version to 1.0.1 by fangyixiao18 in https://github.com/open-mmlab/mmpretrain/pull/1731

New Contributors
* fabien-merceron made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1693
* bryanbocao made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1655
* liyunlong10 made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1722
* NripeshN made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1699

**Full Changelog**: https://github.com/open-mmlab/mmpretrain/compare/1.0.0...v1.0.1

1.0.0

MMPreTrain Release v1.0.0: Backbones, Self-Supervised Learning and Multi-Modalilty

Support more **multi-modal** algorithms and datasets
We are excited to announce that there are several advanced multi-modal methods suppported! We integrated *huggingface/transformers* with vision backbones in MMPreTrain to run inference and training(in developing).

| Methods | Datasets |
|:---|:---|
|[BLIP (arxiv'2022)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/blip) | COCO (caption, retrieval, vqa) |
|[BLIP-2 (arxiv'2023)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/blip2) | Flickr30k (caption, retrieval) |
|[OFA (CoRR'2022)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/ofa) | GQA |
|[Flamingo (NeurIPS'2022)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/flamingo) | NLVR2 |
|[Chinese CLIP (arxiv'2022)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/chinese_clip) | NoCaps |
|[MiniGPT-4 (arxiv'2023)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/minigpt4) | OCR VQA |
|[LLaVA (arxiv'2023)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/llava) | Text VQA |
|[Otter (arxiv'2023)](https://github.com/open-mmlab/mmpretrain/blob/main/configs/otter) | VG VQA|
|| VisualGenomeQA |
|| VizWiz |
|| VSR |

Add **iTPN**, **SparK** self-supervised learning algorithms.
![image](https://github.com/open-mmlab/mmpretrain/assets/36138628/328e4c44-edf6-49cc-a54b-5b6b408cc92f)
![image](https://github.com/open-mmlab/mmpretrain/assets/36138628/82ed14ca-618c-414b-89af-f7476bb5c320)

Provide examples of **New Config** and **DeepSpeed/FSDP**
We test DeepSpeed and FSDP with MMEngine. The following are the memory and training time with ViT-large, ViT-huge and 8B multi-modal models, the left figure is the memory data, and the right figure is the training time data.

1.0.0rc8

Highlights

- Support multiple multi-modal algorithms and inferencers. You can explore these features by the [gradio demo](https://github.com/open-mmlab/mmpretrain/tree/main/projects/gradio_demo)!
- Add EVA-02, Dino-V2, ViT-SAM and GLIP backbones.
- Register torchvision transforms into MMPretrain, you can now easily integrate torchvision's data augmentations in MMPretrain.

New Features

- Support Chinese CLIP. ([1576](https://github.com/open-mmlab/mmpretrain/pull/1576))
- Add ScienceQA Metrics ([1577](https://github.com/open-mmlab/mmpretrain/pull/1577))
- Support multiple multi-modal algorithms and inferencers. ([1561](https://github.com/open-mmlab/mmpretrain/pull/1561))
- add eva02 backbone ([1450](https://github.com/open-mmlab/mmpretrain/pull/1450))
- Support dinov2 backbone ([1522](https://github.com/open-mmlab/mmpretrain/pull/1522))
- Support some downstream classification datasets. ([1467](https://github.com/open-mmlab/mmpretrain/pull/1467))
- Support GLIP ([1308](https://github.com/open-mmlab/mmpretrain/pull/1308))
- Register torchvision transforms into mmpretrain ([1265](https://github.com/open-mmlab/mmpretrain/pull/1265))
- Add ViT of SAM ([1476](https://github.com/open-mmlab/mmpretrain/pull/1476))

Improvements

- [Refactor] Support to freeze channel reduction and add layer decay function ([1490](https://github.com/open-mmlab/mmpretrain/pull/1490))
- [Refactor] Support resizing pos_embed while loading ckpt and format output ([1488](https://github.com/open-mmlab/mmpretrain/pull/1488))

Bug Fixes

- Fix scienceqa ([1581](https://github.com/open-mmlab/mmpretrain/pull/1581))
- Fix config of beit ([1528](https://github.com/open-mmlab/mmpretrain/pull/1528))
- Incorrect stage freeze on RIFormer Model ([1573](https://github.com/open-mmlab/mmpretrain/pull/1573))
- Fix ddp bugs caused by `out_type`. ([1570](https://github.com/open-mmlab/mmpretrain/pull/1570))
- Fix multi-task-head loss potential bug ([1530](https://github.com/open-mmlab/mmpretrain/pull/1530))
- Support bce loss without batch augmentations ([1525](https://github.com/open-mmlab/mmpretrain/pull/1525))
- Fix clip generator init bug ([1518](https://github.com/open-mmlab/mmpretrain/pull/1518))
- Fix the bug in binary cross entropy loss ([1499](https://github.com/open-mmlab/mmpretrain/pull/1499))

Docs Update

- Update PoolFormer citation to CVPR version ([1505](https://github.com/open-mmlab/mmpretrain/pull/1505))
- Refine Inference Doc ([1489](https://github.com/open-mmlab/mmpretrain/pull/1489))
- Add doc for usage of confusion matrix ([1513](https://github.com/open-mmlab/mmpretrain/pull/1513))
- Update MMagic link ([1517](https://github.com/open-mmlab/mmpretrain/pull/1517))
- Fix example_project README ([1575](https://github.com/open-mmlab/mmpretrain/pull/1575))
- Add NPU support page ([1481](https://github.com/open-mmlab/mmpretrain/pull/1481))
- train cfg: Removed old description ([1473](https://github.com/open-mmlab/mmpretrain/pull/1473))
- Fix typo in MultiLabelDataset docstring ([1483](https://github.com/open-mmlab/mmpretrain/pull/1483))

Contributors
A total of 12 developers contributed to this release.

XiudingCai Ezra-Yu KeiChiTse mzr1996 bobo0810 wangbo-zhao yuweihao fangyixiao18 YuanLiuuuuuu MGAMZ okotaku zzc98

1.0.0rc7

- Highlights
- New Features
- Improvements
- Bug Fixes
- Docs Update

Highlights

We are excited to announce that MMClassification and MMSelfSup have been merged into ONE codebase, named MMPreTrain, which has the following highlights:
- Integrated Self-supervised learning algorithms from **MMSelfSup**, such as **MAE**, **BEiT**, etc. Users could find that in our directory `mmpretrain/models`, where a new folder `selfsup` was made, which support 18 recent self-supervised learning algorithms.

| Contrastive leanrning | Masked image modeling |
| :----------------------------: | :----------------------------------: |
| MoCo series | BEiT series |
| SimCLR | MAE |
| BYOL | SimMIM |
| SwAV | MaskFeat |
| DenseCL | CAE |
| SimSiam | MILAN |
| BarlowTwins | EVA |
| DenseCL | MixMIM |

- Support **RIFormer**, which is a way to keep a vision backbone effective while removing token mixers in its basic building blocks. Equipped with our proposed optimization strategy, we are able to build an extremely simple vision backbone with encouraging performance, while enjoying high efficiency during inference.

<div>
<img src="https://user-images.githubusercontent.com/48375204/223930120-dc075c8e-0513-42eb-9830-469a45c1d941.png" width="80%"/>
</div>

- Support **LeViT**, **XCiT**, **ViG**, and **ConvNeXt-V2** backbone, thus currently we support 68 backbones or algorithms and 472 checkpoints.

- Add t-SNE visualization, users could visualize t-SNE to analyze the ability of your backbone. An example of visualization: left is from `MoCoV2_ResNet50` and the right is from `MAE_ViT-base`:

<div>
<img src="https://user-images.githubusercontent.com/36138628/207305086-91df298c-0eb7-4254-9c5b-ba711644501b.png" width="40%" />
<img src="https://user-images.githubusercontent.com/36138628/223383663-a021bb5f-1ef5-404d-87aa-c353edd4e1e1.png" width="40%" />
</div>

- Refactor dataset pipeline visualization, now we could also visualize the pipeline of mask image modeling, such as BEiT:

<div><img src="https://user-images.githubusercontent.com/26739999/226542300-74216187-e3d0-4a6e-8731-342abe719721.png" width="70%"></div>

New Features

- Support RIFormer. ([1453](https://github.com/open-mmlab/mmpretrain/pull/1453))
- Support XCiT Backbone. ([1305](https://github.com/open-mmlab/mmclassification/pull/1305))
- Support calculate confusion matrix and plot it. ([1287](https://github.com/open-mmlab/mmclassification/pull/1287))
- Support RetrieverRecall metric & Add ArcFace config ([1316](https://github.com/open-mmlab/mmclassification/pull/1316))
- Add `ImageClassificationInferencer`. ([1261](https://github.com/open-mmlab/mmclassification/pull/1261))
- Support InShop Dataset (Image Retrieval). ([1019](https://github.com/open-mmlab/mmclassification/pull/1019))
- Support LeViT backbone. ([1238](https://github.com/open-mmlab/mmclassification/pull/1238))
- Support VIG Backbone. ([1304](https://github.com/open-mmlab/mmclassification/pull/1304))
- Support ConvNeXt-V2 backbone. ([1294](https://github.com/open-mmlab/mmclassification/pull/1294))

Improvements

- Use PyTorch official `scaled_dot_product_attention` to accelerate `MultiheadAttention`. ([1434](https://github.com/open-mmlab/mmpretrain/pull/1434))
- Add ln to vit avg_featmap output ([1447](https://github.com/open-mmlab/mmpretrain/pull/1447))
- Update analysis tools and documentations. ([1359](https://github.com/open-mmlab/mmclassification/pull/1359))
- Unify the `--out` and `--dump` in `tools/test.py`. ([1307](https://github.com/open-mmlab/mmclassification/pull/1307))
- Enable to toggle whether Gem Pooling is trainable or not. ([1246](https://github.com/open-mmlab/mmclassification/pull/1246))
- Update registries of mmcls. ([1306](https://github.com/open-mmlab/mmclassification/pull/1306))
- Add metafile fill and validation tools. ([1297](https://github.com/open-mmlab/mmclassification/pull/1297))
- Remove useless EfficientnetV2 config files. ([1300](https://github.com/open-mmlab/mmclassification/pull/1300))

Bug Fixes

- Fix precise bn hook ([1466](https://github.com/open-mmlab/mmpretrain/pull/1466))
- Fix retrieval multi gpu bug ([1319](https://github.com/open-mmlab/mmclassification/pull/1319))
- Fix error repvgg-deploy base config path. ([1357](https://github.com/open-mmlab/mmclassification/pull/1357))
- Fix bug in test tools. ([1309](https://github.com/open-mmlab/mmclassification/pull/1309))

Docs Update

- Translate some tools tutorials to Chinese. ([1321](https://github.com/open-mmlab/mmclassification/pull/1321))
- Add Chinese translation for runtime.md. ([1313](https://github.com/open-mmlab/mmclassification/pull/1313))

Contributors
A total of 13 developers contributed to this release.
Thanks to techmonsterwang , qingtian5 , mzr1996 , okotaku , zzc98 , aso538 , szwlh-c , fangyixiao18 , yukkyo , Ezra-Yu , csatsurnh , 2546025323 , GhaSiKey .

New Contributors
* csatsurnh made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1309
* szwlh-c made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1304
* aso538 made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1238
* GhaSiKey made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1313
* yukkyo made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1246
* 2546025323 made their first contribution in https://github.com/open-mmlab/mmpretrain/pull/1321

**Full Changelog**: https://github.com/open-mmlab/mmpretrain/compare/v1.0.0rc5...v1.0.0rc7

1.0.0rc5

Highlights

- Support EVA, RevViT, EfficientnetV2, CLIP, TinyViT and MixMIM backbones.
- Reproduce the training accuracy of ConvNeXt and RepVGG.
- Support multi-task training and testing.
- Support Test-time Augmentation.

New Features

- [Feature] Add EfficientnetV2 Backbone. ([1253](https://github.com/open-mmlab/mmclassification/pull/1253))
- [Feature] Support TTA and add `--tta` in `tools/test.py`. ([1161](https://github.com/open-mmlab/mmclassification/pull/1161))
- [Feature] Support Multi-task. ([1229](https://github.com/open-mmlab/mmclassification/pull/1229))
- [Feature] Add clip backbone. ([1258](https://github.com/open-mmlab/mmclassification/pull/1258))
- [Feature] Add mixmim backbone with checkpoints. ([1224](https://github.com/open-mmlab/mmclassification/pull/1224))
- [Feature] Add TinyViT for dev-1.x. ([1042](https://github.com/open-mmlab/mmclassification/pull/1042))
- [Feature] Add some scripts for development. ([1257](https://github.com/open-mmlab/mmclassification/pull/1257))
- [Feature] Support EVA. ([1239](https://github.com/open-mmlab/mmclassification/pull/1239))
- [Feature] Implementation of RevViT. ([1127](https://github.com/open-mmlab/mmclassification/pull/1127))

Improvements

- [Reproduce] Reproduce RepVGG Training Accuracy. ([1264](https://github.com/open-mmlab/mmclassification/pull/1264))
- [Enhance] Support ConvNeXt More Weights. ([1240](https://github.com/open-mmlab/mmclassification/pull/1240))
- [Reproduce] Update ConvNeXt config files. ([1256](https://github.com/open-mmlab/mmclassification/pull/1256))
- [CI] Update CI to test PyTorch 1.13.0. ([1260](https://github.com/open-mmlab/mmclassification/pull/1260))
- [Project] Add ACCV workshop 1st Solution. ([1245](https://github.com/open-mmlab/mmclassification/pull/1245))
- [Project] Add Example project. ([1254](https://github.com/open-mmlab/mmclassification/pull/1254))

Bug Fixes

- [Fix] Fix imports in transforms. ([1255](https://github.com/open-mmlab/mmclassification/pull/1255))
- [Fix] Fix CAM visualization. ([1248](https://github.com/open-mmlab/mmclassification/pull/1248))
- [Fix] Fix the requirements and lazy register mmcls models. ([1275](https://github.com/open-mmlab/mmclassification/pull/1275))

Contributors
A total of 12 developers contributed to this release.

marouaneamz piercus Ezra-Yu mzr1996 bobo0810 suibe-qingtian Scarecrow0 tonysy WINDSKY45 wangbo-zhao Francis777 okotaku

Page 1 of 7

Releases

Has known vulnerabilities

Mmpretrain

Page 1 of 7

2.0.0

1.0.1

1.0.0

1.0.0rc8

1.0.0rc7

1.0.0rc5

Page 1 of 7

Links

Releases