Kosh

Latest version: v3.1

Safety actively analyzes 685525 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

3.1

Description

This release is a minor release with a few bux fixes and new features. We encourage users to upgrade.

New in this release

* When searching for datasets from the store you can request to return them as Kosh Datasets (default no change in behaviour), Sina records, or Python dictionaries, which enables faster returns in loops. `store.find([...],load_type='dictionary')`.
* Kosh stores have an `alias_feature` attribute, that is used to allow users to extract features via an aliased name.
* New Auto Epsilon Algorithm for Clustering: The algorithm will find the right epsilon value to use in clustering. The user can specify the amount of allowed information loss due to removing samples from the dataset.
* Requires sina >=1.14
* When opening a mariadb backend, in order to avoid sync error between ranks you should use: `store = kosh.connect(mariadb, execution_options={"isolation_level": "READ COMMITTED"})`
* `mv` and `cp` from the command line now have `--merge_strategy` and `mk_dirs` options
* `cp`, `mv` and `tar` are now accessible from Python at the store level: `store.cp()`, `store.mv()` and `store.tar()`
* There is a [README](tests/README.md) for the Kosh test suite, including a dedicated one for [LC users](tests/LC_README.md)
* Sina new ingest capabilities are available in Kosh via dataset, but with decoartor to allow the use of functions operating on Sina records.
* Documentation switched to mkdocs.

Improvements

* Some internal cleanups (internal kosh attributes are being moved to their own section under the `user_defined`` section of the sina record).
* Clustering now has a verbose option.
* When using MPI the clustering can be gathered to your prefered rank (rather than 0) with `gather_to`
* Batch clustering has a more lenient convergence option resulting in faster clustering sampling.
* Getting a warning when a loader cannot be loaded into the store.
* Using bash rather than sh for the sbang
* `latin1` encoding of loaders seems to create issues with mariadb, switching to `windows-1252`
* Test suite gets mariadb from env variable.
* Issue a warning if trying to set an ensemble attribute from a dataset and it matches the existing value. It still produces an error if the values differ.
* KoshCluster is more consistent in what it returns. It will always return a list now even if None is returned.

Bug fixes

* Kosh parallel clustering used to hang when sample size was too small.
* Kosh parallel clustering returned indices as a 1D array rather than a flat array.
* On BlueOS `update_json_file_with_records_and_relationships` used to fail.
* Reassociating a file linked to many datasets used to fail for other datasets if the reassociation was done at the dataset level.
* `use_lock_file` caused hanging while using mariadb.
* `mv` command now works with nested dirs
* `mv` and `cp` now preserve ensemble membership.
* KoshClustering `operate` uses inputs shape rather than original datasets sizes.

3.0.1

Description

This release is a patch release.

New in this release

* Store can open a dataset based on a Sina record (`store.open(sina_record)`).

Improvements

* Copyright for `compute_hopkins_statistic`
* Uses sinas `exist` function rather than a `try`/`except` to decide if we update or insert new records into the store.

Bug fixes

* None

3.0

Description

This release introduces clustering capabilities into Kosh. It also drops support for Python 2.

New in this release

* Support for Clustering (via operators).
* Dropped Python 2 support.
* Loaders can access the dataset requesting the data
* Operators now have a `describe_entries` function to help them understand what's coming to them.

Improvements

* `find` function accepts `id` as an alias for `id_pool` to restrict search to some ids
* The store can now be used within a context manager
* Curves can be added/removed to a dataset
* Ensembles can be created from the command line
* Passing `type=None` when searching the store will return Kosh specific objects as well as regular datasets (e.g associated files objects)
* Added a `verbose` mode to `dataset.list_features()` to let users know when a loader failed to load a uri. Mostly useful for debugging


Bug fixes

* Fixed an issue where associating a file multiple time was not reflected into the store and a subsequent dissociation or an async association would not be caught. Dissociation would cause the object to be removed from the store.
* Deleting a dataset attribute and re-adding it would cause a crash
* Loaders can be removed from store

2.2

Description

This is a maintenace release, with added support for Windows systems.

New in this release

* Support for Windows systems (note that `kosh cp` and `kosh mv` are not currently supported on Windows)
* While importing sina `json` files you can now skip over some sections, such as `curve_sets`.

Improvements

* `find` function accepts `id` as an alias for `id_pool` to restrict search to some ids

Bug fixes

* Fixed an issue where associating a file multiple time was not reflected into the store and a subsequent dissociation or an async association would not be caught. Dissociation would cause the object to be removed from the store.

2.1

Description

This is a mainly maintenance release introducing a few new features.

New in this release

* Decorators for transformers and operators
* Regular `def foo(...):` can now be converted to transformers or operators via decorators. e.g for a numpy transformer: `kosh.numpy_transformer`. See the [transformer](examples/Example_05a_Transformers.ipynb) and [operators](examples/Example_06_Operators.ipynb) notebooks for more details.
* Introducing a loader for text and column-based data, based on top `numpy.loadtxt`. See [this](examples/Example_column_based_text_files.ipynb) notebook.
* When cloning a dataset one can choose to preserve ensemble memberships (`preserve_ensembles_memberships=True`), simply copy over these attributes (`preserve_ensembles_memberships=False`) (**default**) or ignore information related to ensembles (`preserve_ensembles_memberships=-1`)
* Dataset objects now have a `is_ensemble_attribute()` function to know if an attribute belongs to an ensemble.
* Added `list_attributes()` function to ensemble objects.

Improvements

* When printing a dataset, the attributes coming from ensembles the dataset belongs to are listed in a separate section.

Bug fixes

* When cloning a dataset, an artificial `id` attribute was created with the original dataset `id` in it.
* Setting an attribute to an invalid value would cause the dataset to disappear from the store.

2.0

Description

This release aligns Kosh with Sina and makes it the only backend. Kosh and Sina (1.11) API's have been mostly aligned.
Sina curve and file section can now be recognized and taken advantage by Kosh.

New in this release

* Sina alignment:
* Sina is only supported backend, no more code to potentially support other backends
* Curves appear as associated
* Files with `mimetype` appear as virtual associated files
* Kosh exported json files are sina-compatible and Kosh can ingest Sina's json files
* Stores are now opened via: `kosh.connect(...)`
* `search(...)` is now `find(...)`
* Any non Kosh-reserved record type is considered a dataset
* `find` functions return generators (used to be lists)
* Support for ensembles
* Kosh stores can be associated with other Kosh stores.
* `kosh` command line:
* Can create stores
* Can add datasets
* Can use htar to tar up data
* Datasets can be cloned
* While importing a dataset into a store, there are now options to handle conflicts.
* Loader for file saved by numpy (`.npy`)
* Store can fix changed/updated fast_sha

Improvements

* Do not try to import external Python packages until needed -> some loader might appear as valid even though python packages are missing.
* Versioning is now pip compatible
* `import_dataset(...)` can import list of datasets
* Added `verbose` argument to transformers and operators -> this will let the user know when retrieving data from cache
* Set user name to "default" if can't get it from USER env var

Bug fixes

* `matplotlib` import would crash if no DISPLAY environment variable
* hdf5 leading / fix
* No more error if a known mime type points to missing file (`list_features` will not show it)
* Dissociate files from store after moving them

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.