1. add cuda.set_device(args.local_rank) thus card 0 will not have extra memory cost
0.0.15
1. import HOOKS in hooks.__init__.py and hoos.training_hooks.py 2. register validation hooks early thus it make sense
0.0.13
1. breaking changes: dataloader index start from 0 now, not 1 as before 2. you can register custom hooks now by write them in config. The figure of train_api is also updated. 3. using pop(key, None) syntax