Added
- Batch type option `max-word` for max number of words including padding tokens (more predictable memory usage than `word`).
- Batching option `--batch-sentences-multiple-of` that is similar to `--round-batch-sizes-to-multiple-of` but always rounds down (more predictable memory usage).
Changed
- Default bucketing settings changed to width 8, max sequence length 95 (96 including BOS/EOS tokens), and no bucket scaling.
- Argument `--no-bucket-scaling` replaced with `--bucket-scaling` which is False by default.