Added
- New CLI `sockeye.prepare_data` for preprocessing the training data only once before training,
potentially splitting large datasets into shards. At training time only one shard is loaded into memory at a time,
limiting the maximum memory usage.
Changed
- Instead of using the --source and --target arguments sockeye.train now accepts a
--prepared-data argument pointing to the folder containing the preprocessed and sharded data. Using the raw
training data is still possible and now consumes less memory.