- `rtg.fork` accepts multiple to_dir; thus supports cloning multiple times at once
- Bug fix: early stopping on distributed parallel training
- `rtg.tool.augment` to support data augmentations
- Add attention visualization in rtg.serve; powered by plotly
- rtg.pipeline and rtg.fork: uses relative symlinks instead of absolute paths
- rtg.decode shows decoding speed (segs, src_toks, hyp_toks)
- `batch_size` is auto adjusted based on number of workers and gradient_accum (huh! finally)
- `batch_size` normalizer in distributed training setting (fix! faster convergence now)
- support for `byte` encoding added
- Validation metrics; previously BLEU was teacher-forced similar to validation loss, now BLEU is from autoregressive output (resembling test time)
- Use bfloat16 for mixed precision training, requires torch 1.10+
-