Added
- add helper function to TestLearncurve that multiple unit tests can use to assert all outputs
were generated. Now being used to make sure bug fixed in 0.1.0a8 stays fixed.
- error checking in cli that raises ValueError when cli command is `learncurve` and the option
'results_dir_made_by_main_script' is already defined in [OUTPUT] section, since running
'learncurve' would overwrite it.
- `dataset` subpackage that houses `Dataset` and related classes that facilitate creating data sets for training neural networks from heterogeneous data: audio files, files of arrays containing spectrograms, different annotation types, etc.
- also includes modules for handling each data source
+ e.g. `audio.to_spect` creates spectrograms from audio files
+ `spect.from_files` creates a `Dataset` from spectrogram files
- `core` sub-package that contains / will contain functions that do heavy lifting: `learning_curve`, `train`, `predict`
+ `learning_curve` is a sub-sub-module that does both `train` and `test` of models, instead of having a separate `learncurve` and `summary` function (i.e. train and test). Still will confuse some ML/AI people that this "learning curve" has a test data step but whatevs
+ `cli` sub-package calls / will call these functions and handle any command-line-interface specific logic
(e.g. making changes to `config.ini` files)
Changed
- change name of `vak.cli.make_data` to `vak.cli.prep`
- structure of `config.ini` file
+ now specify either `audio_format` or `spect_format` in `[DATA]` section
+ and `annot_format` for annotations
- refactor `utils` sub-package
+ move several functions from `data` and `general` into a `labels` module
Removed
- remove unused options from command-line interface: `--glob`, `--txt`, `--dataset`
- `skip_files_with_labels_not_in_labelset` option
+ now happens whenever `labelset` is specified; if no `labelset` is given then no filtering is done
- `summary` command-line option, since `learncurve` now runs trains models and also tests them on separate data set
- `silent_label_gap` option, because `Dataset` class determines if a label for unlabeled segments between other segments is needed, and if so automatically assigns this a label of 0 when mapping user labels to consecutive integers
+ this way user does not have to think about it
+ and program doesn't have to keep track of a `labels_mapping` file that saves what user specified