Added
- add functions `format_from_df` and `from_df` to `vak.util.annotation`
[107](https://github.com/vocalpy/vak/pull/107)
+ `vak.util.annotation.from_from_df` returns annotation format associated with a
dataset. Raises an error if more than one annotation format or if format is none.
+ `vak.util.annotation.from_df` function returns list of annotations
(i.e. `crowsetta.Annotation` instances), one corresponding to each row in the dataframe `df`.
- encapsulates control flow logic for getting all labels from a dataset of
annotated vocalizations represented as a Pandas DataFrame
+ handles case where each vocalization has a separate annotation file
+ and the case where all vocalizations have annotations in a single file
- `vak.util.labels.from_df` function [103](https://github.com/vocalpy/vak/pull/103)
+ checks for single annotation type, load all annotations, and then get just labels from those
+ modified to use `util.annotation.from_df` and `vak.util.annotation.format_from_df`
in [107](https://github.com/vocalpy/vak/pull/107)
- logic in `vak.cli.prep` that raises an informative error message when config.toml file specifies
a duration for training set, but durations for validation and test sets are zero or None
[108](https://github.com/vocalpy/vak/pull/108)
+ since there's no functionality for making only one dataset of a specified dataset
- 3 transform classes, and `vak.transforms.util` module [112](https://github.com/vocalpy/vak/pull/112)
+ with `get_defaults` function
- encapsulates logic for building transforms, to make `train`, `predict` etc. less verbose
+ obeys DRY, avoid declaring the same utility transforms like to_floattensor and add_channel in
multiple functions
- add `labelset_from_toml_value` to converters [115](https://github.com/vocalpy/vak/pull/115)
+ casts any value for the `labelset` option in a .toml config file to a set of characters
[127](https://github.com/vocalpy/vak/pull/127)
+ uses `vak.util.general.range_str` so that user can specify
set of labels with a "range string", e.g. `range: 1-27, 29` [115](https://github.com/vocalpy/vak/pull/115)
- add logging module in `vak.util` [132](https://github.com/vocalpy/vak/pull/132)
- add converters and validators for dataset split durations [143](https://github.com/vocalpy/vak/pull/143)
- add `logger` parameters to `io` sub-package functions, so they can use logger created by `cli` functions
[145](https://github.com/vocalpy/vak/pull/145)
- add `log_or_print` function to `util.logging` that either writes message to logger,
or simply prints the message if there is no logger [147](https://github.com/vocalpy/vak/pull/147)
- add `logger` attribute to `vak.Model` class, used to log if not None
[148](https://github.com/vocalpy/vak/pull/148)
- add Tensorboard `SummaryWriter` to `vak.Model` class so there is an `events` file recording each
model's training history [149](https://github.com/vocalpy/vak/pull/149)
+ and add Tensorboard as a dependency in [162](https://github.com/vocalpy/vak/pull/162)
- add additional logging to `Model` class [153](https://github.com/vocalpy/vak/pull/153)
- add initial tutorial on using `vak` for automated annotation of vocalizations
[156](https://github.com/vocalpy/vak/pull/156)
- add `VocalDataset`, more generalized form of a dataset where the input to a network is contained in a source
file, e.g. a .npz array file with a spectrogram, and the optional target is the annotation
[165](https://github.com/vocalpy/vak/pull/165)
- add `transforms.defaults` with `ItemTransforms` that return dictionaries. Decouples logic for
what will be in returned "items" from the different dataset classes [165](https://github.com/vocalpy/vak/pull/165)
- add `eval` command to command-line interface [179](https://github.com/vocalpy/vak/pull/179)
- add `vak.core` sub-package with "core" functions that are called by corresponding functions in
`vak.cli`, e.g. `vak.cli.train` calls `vak.core.train`; de-couples high-level functionality from
command-line interface, and makes it possible for one high-level function to call another, i.e.,
`vak.core.learncurve` calls `vak.core.train` and `vak.core.eval`
[183](https://github.com/vocalpy/vak/pull/183)
- add computation of distance metrics to `Model._eval` method
[185](https://github.com/vocalpy/vak/pull/185)
Changed
- rewrite `vak.util.dataset.has_unlabeled` to use `annotation.from_df`
[107](https://github.com/vocalpy/vak/pull/107)
- bump minimum version of `TweetyNet` to 0.3.1 in [120](https://github.com/vocalpy/vak/pull/120)
+ so that `yarden2annot` function from `TweetyNet` will return annotation labels as string, not int
- rewrite `vak.util.annotation.source_annot_map` so that it maps annotations *to* source files, not
vice versa [130](https://github.com/vocalpy/vak/pull/130)
+ more specifically, it no longer crashes if it can't map every annotation to a source file
+ instead it crashes if it can't map every source file to an annotation
- change `vak.annotation.from_df` to better handle single annotation files
[131](https://github.com/vocalpy/vak/pull/131)
+ no longer crashes if the number of annotations from the file does not exactly match the number of source files
+ instead only requires there at least as many annotations as there are source files
- rewrite `vak.util.labels.from_df` to use `vak.util.annotation.from_df`
[131](https://github.com/vocalpy/vak/pull/131)
- rewrite `WindowDataset` to use `annotation.from_df` function [113](https://github.com/vocalpy/vak/pull/113)
- change default value for util.general.timebin_dur_from_vec parameter n_decimals_trunc from 3 to 5
[136](https://github.com/vocalpy/vak/pull/136)
- rewrite + rename `splitalgos.validate.durs` [143](https://github.com/vocalpy/vak/pull/143)
- parallelize validation of spectrogram files, so it's faster on large datasets
[144](https://github.com/vocalpy/vak/pull/144)
- bump minimum version of `TweetyNet` to 0.4.0 in [155](https://github.com/vocalpy/vak/pull/155)
+ so `TweetyNetModel.from_class` method accepts `logger` argument
- change checkpointing and validation so that they occur on specific steps, not epochs.
[161](https://github.com/vocalpy/vak/pull/161)
This way models with very large training sets that may run for only 1-2 epochs still intermittently save
checkpoints as backups and measure performance on the validation set.
- change names of `TrainConfig` attributes `val_error_step` and `checkpoint_step` to `val_step` and `ckpt_step`
for brevity + clarity. [161](https://github.com/vocalpy/vak/pull/161) Also changed the names of the
corresponding `vak.Model.fit` method parameters to match.
- change `vak.Model._eval` method to work like `vak.cli.predict` does, feeding models non-overlapping
windows from spectrograms [165](https://github.com/vocalpy/vak/pull/165)
- change `reshape_to_window` transform to `view_as_window_batch` because it was not working as intended
[165](https://github.com/vocalpy/vak/pull/165)
- bump minimum version of `TweetyNet` to 0.4.1 in [172](https://github.com/vocalpy/vak/pull/172)
+ version that changes optimizer back to `Adam`
- raise lower bound on `crowsetta` version to 2.2.0, to get fixes for `koumura2annot`
and avoid errors when `annot_file` is provided as a `pathlib.Path` instead of a `str`
[175](https://github.com/vocalpy/vak/pull/175)
- change `Model._eval` method so it returns metrics average across batches, in addition to
the value for each batch
[185](https://github.com/vocalpy/vak/pull/185)
- raise minimum version of `TweetyNet` to 0.4.2, adds distance metrics to `TweetyNetModel`
[9626385](https://github.com/vocalpy/vak/commit/96263858efe880f94dc782cd8a66ec1c051f2ea1)
Fixed
- add missing `shuffle` option to [TRAIN] and [LEARNCURVE] sections in `valid.toml`
[109](https://github.com/vocalpy/vak/pull/109)
- bug that prevented filtering out vocalizations from a dataset when labels are present
in that vocalization that are not in the specified labelset [118](https://github.com/vocalpy/vak/pull/118)
- fix logging for `vak.prep` command [132](https://github.com/vocalpy/vak/pull/132)
- fix how dataset duration splits are validated [143](https://github.com/vocalpy/vak/pull/143),
see issue [140](https://github.com/vocalpy/vak/issues/140) for details.
- fix error due to calling a Path attribute on a string [144](https://github.com/vocalpy/vak/pull/144)
as identified in issue [123](https://github.com/vocalpy/vak/issues/123)
- fix indent error in `Model.fit` method (see issue [151](https://github.com/vocalpy/vak/issues/151))
that stopped training early [153](https://github.com/vocalpy/vak/pull/153)
- fix bug [166](https://github.com/vocalpy/vak/issues/166)
that let training continue even after `patience` number of validation steps had elapsed
without an increase in accuracy [168](https://github.com/vocalpy/vak/pull/168)
- fix `learncurve` functionality so it will work in version `0.3.0`
[183](https://github.com/vocalpy/vak/pull/183)
Removed
- remove `vak.util.general.safe_truncate` function, no longer used
[137](https://github.com/vocalpy/vak/issues/137)
- remove redundant validation of split durations in `util.split`
[143](https://github.com/vocalpy/vak/pull/143)
- removed `save_only_single_checkpoint_file` option and functionality
[161](https://github.com/vocalpy/vak/pull/161).
Now save only one checkpoint as backup, and another for best performance on validation set if provided.
See discussion in pull request and the issues it fixes for more detail.