Added
- Python 3.7 compatibility
- Raise a label mismatch exception if label kwarg to Corpus constructor is inconsistent with automatically determined labels.
- Test fixtures for Corpus creation
- Test coverage for Corpus and Model creation
- Callback at the end of each training epoch
- JSON output of model information
- Encoding of files is now handled explicitly
- Type annotations have been added to most functions that previously were missing them
Changed
- Update package dependencies versions.
Fixed
- `CorpusReader.train_batch_gen` now correctly handles edge case when no data can be generated.
- Decoding from saved model is now possible for arbitrary Tensorflow model topologies that have the same input and output structure via named arguments that specify where input and output to the model occur.
- RNN CTC model class now accepts `pathlib.Path` for directory argument
- Max epochs for model training is now correct. Previously there was an off by one error where one more than the supplied max epochs would be run in the training loop.
- Bug where `untranscribed_prefixes` in corpus was taking an intersection of two sets instead of a union.
- Splitting of test, train and validation data sets will no longer produce empty sets. If no possible split can be made it will report the error via raising an exception.
- Empty wave files no longer crash on attempted feature extraction and are now skipped instead.
- Update nltk dependency to resolve possible security issue