pyspacer Changelog

0.10.0

- AWS credentials can now be obtained through the following methods, in addition to spacer config values as before:

- AWS's metadata service (STS); proof of concept from michaelconnor00
- boto's auto-detection logic when neither of STS or spacer config are used (this was intended to work before, but needed fixing)

- Updates to pip-install dependencies:

- numpy: >=1.19 to >=1.21.4,<2
- boto3: nothing to >=1.26.0

0.9.0

- Python 3.8 and 3.9 support have been dropped; Python 3.11 support has been added.

- torch and torchvision accepted versions have been relaxed to accommodate Python 3.11. (torch==1.13.1 to torch>=1.13.1,<2.3; torchvision==0.14.1 to torchvision>=0.14.1,<0.18)

- `task_utils.preprocess_labels()` now has three available modes on how to split training annotations between train, ref, and val sets. Differences between the three modes - `VECTORS`, `POINTS`, and `POINTS_STRATIFIED` - are explained in the `SplitMode` Enum's comments. Additionally, all three modes now ensure that the ordering of the given training data has no effect on which data goes into train, ref, and val.

The table below compares the three modes to the splitting functionality of earlier versions of pyspacer. Note that it's still possible to split train/ref/val yourself instead of letting pyspacer do it.

| Mode | Sets split in pyspacer | Order agnostic | Vectors can be split | Stratifies by label |
|-------------------|------------------------|----------------|----------------------|---------------------|
| 0.6.1 and earlier | Train/ref | No | No | No |
| 0.7.0 - 0.8.0 | Train/ref/val | No | No | No |
| VECTORS | Train/ref/val | Yes | No | No |
| POINTS | Train/ref/val | Yes | Yes | No |
| POINTS_STRATIFIED | Train/ref/val | Yes | Yes | Yes |

- The `train_classifier` task now accepts label IDs as either integers or strings, not just integers.

- The `train_classifier` task is now able to locally cache feature vectors which were loaded from remote storage, which can greatly speed up training from epoch 2 onward. This is optional and enabled by default; the location of the cache directory is also configurable.

0.8.0

- `ImageFeatures` with `valid_rowcol=False` are no longer supported for training. For now they are still supported for classification.

- S3 downloads are now always performed in the main thread, to prevent `RuntimeError: cannot schedule new futures after interpreter shutdown`.

- `S3Storage` and `storage_factory()` now use the parameter name `bucket_name` instead of `bucketname` to be consistent with other usages in pyspacer (by yeelauren).

- `URLStorage` downloads and existence checks now have an explicit timeout of 20 seconds (this is a timeout for continuous unresponsiveness, not for the whole response).

- EfficientNet feature extraction now uses CUDA if available (by yeelauren).

- Updates to pip-install dependencies:

- Pillow: >=10.0.1 to >=10.2.0

0.7.0

- `TrainClassifierMsg` labels arguments have changed. Instead of `train_labels` and `val_labels`, it now takes a single argument `labels`, which is a `TrainingTaskLabels` object (basically a set of 3 `ImageLabels` objects: training set, reference set, and validation set).

- The new function `task_utils.preprocess_labels()` can be called in advance of building a TrainClassifierMsg, to 1) split a single ImageLabels instance into reasonably-proportioned train/ref/val sets, 2) filter labels to only a desired set of classes, and 3) run error checks.

- Removed `MIN_TRAINIMAGES` config var. Minimum number of images for training is now 1 train set, 1 ref set, and 1 val set; or 3 total if leaving the split to pyspacer.

- Added `LOG_DESTINATION` and `LOG_LEVEL` config vars, providing configurable logging for test-suite runs or quick scripts.

- Logging statements throughout pyspacer's codebase now use module-name loggers rather than the root logger, allowing end-applications to keep their logs organized.

- Fixed bug where int config vars couldn't be configured through environment vars or secrets.json.

- Updated various error cases (mainly SpacerInputErrors, asserts, and ValueErrors) with more descriptive error classes. The `SpacerInputError` class is no longer available.

0.6.1

- In 0.5.0, the hash check when loading a feature extractor was broken in two ways. First, it got an error when trying to check the hash. Second, if the hash check failed for a remote-loaded extractor file, then a second attempt at loading would still allow extraction to proceed. This release fixes both problems.

0.6.0

- Fixed `DummyExtractor` constructor so that `data_locations` defaults to an empty dict, not an empty list. This fixes serialization of an `ExtractFeaturesMsg` containing `DummyExtractor`.

- Updates to pip-install dependencies:

- Pillow: >=9.0.1 to >=10.0.1

Pyspacer

Page 1 of 2