Dlrover

Latest version: v0.4.0

Safety actively analyzes 724372 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

0.3.0

Features:
* Flash Checkpoint to asynchronously persist checkpoint to storage.
* Flash Checkpoint recovers failure in memory.
* Flash Checkpoint supports DDP/FSDP/DeepSpeed/Megatron
* Node detection supports NPU.

Examples
* The example of training nanoGPT using DeepSpeed.
* The example to save/load sharding FSDP checkpoint.

0.2.2

Features:
* dlrover-run can run on any distributed jobs with the NODE_RANK and DLROVER_MASTER_ADDR in the environment.
* DLRover can asynchronously save the checkpoint into the storage which only block the training with a few time.

BugFix:
* Fix the bug to load the FSDP checkpoint.

0.2.1

* Autotuning batch size without restarting the job.
* Automatically detect the straggler (slow worker).
* TFPlus: TFPlus 0.1.0 has been released, see detail in https://github.com/intelligent-machine-learning/dlrover/tree/master/tfplus

Page 2 of 2

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.