Added:
- New multilingual full models: `large-en-xx`, `es-xx`, `large-es-xx`.
- Options to overwrite source or target language in model config.
- Generate training data with separated command.
- More logging messages during noise generation.
- Compress noise generation intermediate files.
- Multilingual models training documentation.
- Show a warning when no GPU/TPU has been detected.
- Show a warning if some layers do not correctly.
Changed:
- Huge improvements in accuracy multilingual full models.
- Updated mutilingual full model `en-xx`.
- Upload multilingual full models to HF.
- Disable hardrules that need lang parameter when using multilingual model if language has not been overwritten.
- Optional `parallel_train`, `parallel_valid` and noise generation in `bicleaner_train`.
- Test classify full with HF model.
- Re-organize documentation.
- Updated Tensorflow.
- Support for Python 3.11.
- Updated Transformers to 4.36.
Fixed:
- Always English being used in Tokenizer during noise generation.
- Noise generation failing when `freq_noise` was disabled.
- Accidentally deleting generated valid file at the end of training.