Distributed data parallel training
Distributed data parallel training with multiple nodes and GPUs has been one of the most demanded feature. Now, it's finally available! It's extremely easy to use this feature.
Example:
py
train.py
from typing import Dict
import d3rlpy
def main() -> None:
GPU version:
rank = d3rlpy.distributed.init_process_group("nccl")
rank = d3rlpy.distributed.init_process_group("gloo")
print(f"Start running on rank={rank}.")
GPU version:
device = f"cuda:{rank}"
device = "cpu:0"
setup algorithm
cql = d3rlpy.algos.CQLConfig(
actor_learning_rate=1e-3,
critic_learning_rate=1e-3,
alpha_learning_rate=1e-3,
).create(device=device)
prepare dataset
dataset, env = d3rlpy.datasets.get_pendulum()
disable logging on rank != 0 workers
logger_adapter: d3rlpy.logging.LoggerAdapterFactory
evaluators: Dict[str, d3rlpy.metrics.EvaluatorProtocol]
if rank == 0:
evaluators = {"environment": d3rlpy.metrics.EnvironmentEvaluator(env)}
logger_adapter = d3rlpy.logging.FileAdapterFactory()
else:
evaluators = {}
logger_adapter = d3rlpy.logging.NoopAdapterFactory()
start training
cql.fit(
dataset,
n_steps=10000,
n_steps_per_epoch=1000,
evaluators=evaluators,
logger_adapter=logger_adapter,
show_progress=rank == 0,
enable_ddp=True,
)
d3rlpy.distributed.destroy_process_group()
if __name__ == "__main__":
main()
You need to use `torchrun` command to start training, which should be already installed once you install PyTorch.
$ torchrun \
--nnodes=1 \
--nproc_per_node=3 \
--rdzv_id=100 \
--rdzv_backend=c10d \
--rdzv_endpoint=localhost:29400 \
train.py
In this case, 3 processes will be launched and start training loop. `DecisionTransformer`-based algorithms also support this distributed training feature.
The example is also available [here](https://github.com/takuseno/d3rlpy/blob/master/examples/distributed_offline_training.py)
Minari support (thanks, grahamannett !)
[Minari](https://github.com/Farama-Foundation/Minari) is an OSS library to provide a standard format of offline reinforcement learning datasets. Now, d3rlpy provides an easy access to this library.
You can install Minari via d3rlpy CLI.
$ d3rlpy install minari
Example:
py
import d3rlpy
dataset, env = d3rlpy.datasets.get_minari("antmaze-umaze-v0")
iql = d3rlpy.algos.IQLConfig(
actor_learning_rate=3e-4,
critic_learning_rate=3e-4,
batch_size=256,
weight_temp=10.0,
max_weight=100.0,
expectile=0.9,
reward_scaler=d3rlpy.preprocessing.ConstantShiftRewardScaler(shift=-1),
).create(device="cpu:0")
iql.fit(
dataset,
n_steps=1000000,
n_steps_per_epoch=100000,
evaluators={"environment": d3rlpy.metrics.EnvironmentEvaluator(env)},
)
Minimize redundant computes
From this version, calculation of some algorithms are optimized to remove redundant inference. Therefore, especially algorithms with dual optimization such as `SAC` and `CQL` became extremely faster than the previous version.
Enhancements
- `GoalConcatWrapper` has been added to support goal-conditioned environments.
- `return_to_go` has been added to `Transition` and `TransitionMiniBatch`
- `MixedReplayBuffer` has been added to sample two experiences from multiple buffers with arbitrary ratio.
- `initial_temperature` supports 0 at `DiscreteSAC`.
Bugfix
- Getting started page has been fixed.