Summary
Simplify manifest and dataset serialisation so their datasets' and files' metadata is gathered dynamically at instantiation instead of from hard-coded values in the serialisation. To allow this properly on both cloud and local platforms, the ability to store metadata locally has been introduced. This has also required the definition of a dataset to be tightened so that each dataset is either cloud-based or local, but not both.
<!--- SKIP AUTOGENERATED NOTES --->
Contents ([402](https://github.com/octue/octue-sdk-python/pull/402))
**IMPORTANT:** There are 5 breaking changes.
New features
- Store local datafile metadata locally in a `.octue` JSON file in the same directory
Enhancements
Dataset:
- **BREAKING CHANGE:** Change `Dataset` serialisation to only include the paths to its files and its own metadata
- **BREAKING CHANGE:** Store local dataset metadata locally in a `.octue` JSON file in its root directory instead of `datafile_metadata.json`
- Allow datasets to be instantiated from iterables of paths to files
- Remove `Pathable` mixin
- Unify `Dataset.path` and `Dataset.cloud_path`
- Raise error if trying to download files from a local dataset
- **BREAKING CHANGE:** Remove confusing way of adding files to datasets
- **BREAKING CHANGE:** Only allow one file to be added to a dataset at a time in `Dataset.add`
- Allow specification of path within dataset when adding a file to a dataset
Manifest:
- **BREAKING CHANGE:** Change `Manifest` serialisation to only include the paths to its datasets and its own metadata
- Remove `Pathable` mixin
- Remove `path` attribute
Other:
- Use `Serialisable.to_primitive` as the basis for `Serialisable.serialise` to simplify and speed up conversion to primitives
- Use `Manifest.to_primitive` in `Analysis.finalise`
- Update remaining JSON metaschema references to use the latest metaschema
Fixes
- Stop setting any given kwarg as attributes in `Dataset` and `Manifest`
- Ignore `.octue` files when constructing datasets
Dependencies
- Change minimum python version supported from `3.7` to `3.7.1`
- Use `pandas=^1.3` to avoid `numpy` array size errors
- Use `twined=0.3.0`
Refactoring
- Reorder and rename methods in `Datafile`, `Dataset`, and `Manifest`
- Move dataset files tag checking from `twined` into `Runner._validate_dataset_file_tags`
- Move datafile instantiation in `Dataset` into separate method
<!--- END AUTOGENERATED NOTES --->