Kosh

Latest version: v3.2.0

Safety actively analyzes 714736 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 2

3.2.0

Description

This release includes several new defining features that improve Kosh data management and functionality.

New in this release

* Pandas read functions were added as loader types and added examples to notebooks.
* `examples/Example_Simulation_Workflow.ipynb` notebook shows the user how to add, update, and access data for Kosh in a generic simulation workflow.
* `examples/MPI_Kosh_data_management.ipynb` notebook shows the user how to load and operate on large datasets in a distributed way.
* Convert `.find()` methods to Pandas DataFrames
* L1, L2 and L* operators.
* Organize datasets within ensembles with `ensemble_tags` which also work with `ensemble.find_datasets(data=target_data, ensemble_tags=target_ensemble_tags))`.
* Lock Strategies for parallel calls see: `examples/Example_ThreadSafe.ipynb`
* Added `connection_type='append'` which allows the user to only add data but not modify or delete data.

Improvements

* `koshClustering` operator can return a non-dimensional loss, distribute data from one rank, and added example updates.
* `dataset.list_features(use_cache=True)` allows for faster listing of features since they are now cached.
* original `import sina` is much faster as most imports are delayed until actually needed (lazy imports).
* When printing extremely long strings stored as attributes, the terminal output is overwhelming for users. We now utilize the reprlib library.

Bug fixes

* Fixes to internal tests
* tar `-v` command is now accepted
* Some special sina files couldn't be imported by Kosh (e.g no `data` section)

3.1

Description

This release is a minor release with a few bug fixes and new features. We encourage users to upgrade.

New in this release

* When searching for datasets from the store you can request to return them as Kosh Datasets (default no change in behaviour), Sina records, or Python dictionaries, which enables faster returns in loops. `store.find([...],load_type='dictionary')`.
* Kosh stores have an `alias_feature` attribute, that is used to allow users to extract features via an aliased name.
* New Auto Epsilon Algorithm for Clustering: The algorithm will find the right epsilon value to use in clustering. The user can specify the amount of allowed information loss due to removing samples from the dataset.
* Requires sina >=1.14
* When opening a mariadb backend, in order to avoid sync error between ranks you should use: `store = kosh.connect(mariadb, execution_options={"isolation_level": "READ COMMITTED"})`
* `mv` and `cp` from the command line now have `--merge_strategy` and `mk_dirs` options
* `cp`, `mv` and `tar` are now accessible from Python at the store level: `store.cp()`, `store.mv()` and `store.tar()`
* There is a [README](tests/README.md) for the Kosh test suite, including a dedicated one for [LC users](tests/LC_README.md)
* Sina new ingest capabilities are available in Kosh via dataset, but with decoartor to allow the use of functions operating on Sina records.
* Documentation switched to mkdocs.

Improvements

* Some internal cleanups (internal kosh attributes are being moved to their own section under the `user_defined`` section of the sina record).
* Clustering now has a verbose option.
* When using MPI the clustering can be gathered to your prefered rank (rather than 0) with `gather_to`
* Batch clustering has a more lenient convergence option resulting in faster clustering sampling.
* Getting a warning when a loader cannot be loaded into the store.
* Using bash rather than sh for the sbang
* `latin1` encoding of loaders seems to create issues with mariadb, switching to `windows-1252`
* Test suite gets mariadb from env variable.
* Issue a warning if trying to set an ensemble attribute from a dataset and it matches the existing value. It still produces an error if the values differ.
* KoshCluster is more consistent in what it returns. It will always return a list now even if None is returned.

Bug fixes

* Kosh parallel clustering used to hang when sample size was too small.
* Kosh parallel clustering returned indices as a 1D array rather than a flat array.
* On BlueOS `update_json_file_with_records_and_relationships` used to fail.
* Reassociating a file linked to many datasets used to fail for other datasets if the reassociation was done at the dataset level.
* `use_lock_file` caused hanging while using mariadb.
* `mv` command now works with nested dirs
* `mv` and `cp` now preserve ensemble membership.
* KoshClustering `operate` uses inputs shape rather than original datasets sizes.

3.0.1

Description

This release is a patch release.

New in this release

* Store can open a dataset based on a Sina record (`store.open(sina_record)`).

Improvements

* Copyright for `compute_hopkins_statistic`
* Uses sinas `exist` function rather than a `try`/`except` to decide if we update or insert new records into the store.

Bug fixes

* None

3.0

Description

This release introduces clustering capabilities into Kosh. It also drops support for Python 2.

New in this release

* Support for Clustering (via operators).
* Dropped Python 2 support.
* Loaders can access the dataset requesting the data
* Operators now have a `describe_entries` function to help them understand what's coming to them.

Improvements

* `find` function accepts `id` as an alias for `id_pool` to restrict search to some ids
* The store can now be used within a context manager
* Curves can be added/removed to a dataset
* Ensembles can be created from the command line
* Passing `type=None` when searching the store will return Kosh specific objects as well as regular datasets (e.g associated files objects)
* Added a `verbose` mode to `dataset.list_features()` to let users know when a loader failed to load a uri. Mostly useful for debugging

Bug fixes

* Fixed an issue where associating a file multiple time was not reflected into the store and a subsequent dissociation or an async association would not be caught. Dissociation would cause the object to be removed from the store.
* Deleting a dataset attribute and re-adding it would cause a crash
* Loaders can be removed from store

2.2

Description

This is a maintenace release, with added support for Windows systems.

New in this release

* Support for Windows systems (note that `kosh cp` and `kosh mv` are not currently supported on Windows)
* While importing sina `json` files you can now skip over some sections, such as `curve_sets`.

Improvements

* `find` function accepts `id` as an alias for `id_pool` to restrict search to some ids

Bug fixes

* Fixed an issue where associating a file multiple time was not reflected into the store and a subsequent dissociation or an async association would not be caught. Dissociation would cause the object to be removed from the store.

2.1

Description

This is a mainly maintenance release introducing a few new features.

New in this release

* Decorators for transformers and operators
* Regular `def foo(...):` can now be converted to transformers or operators via decorators. e.g for a numpy transformer: `kosh.numpy_transformer`. See the [transformer](examples/Example_05a_Transformers.ipynb) and [operators](examples/Example_06_Operators.ipynb) notebooks for more details.
* Introducing a loader for text and column-based data, based on top `numpy.loadtxt`. See [this](examples/Example_column_based_text_files.ipynb) notebook.
* When cloning a dataset one can choose to preserve ensemble memberships (`preserve_ensembles_memberships=True`), simply copy over these attributes (`preserve_ensembles_memberships=False`) (**default**) or ignore information related to ensembles (`preserve_ensembles_memberships=-1`)
* Dataset objects now have a `is_ensemble_attribute()` function to know if an attribute belongs to an ensemble.
* Added `list_attributes()` function to ensemble objects.

Improvements

* When printing a dataset, the attributes coming from ensembles the dataset belongs to are listed in a separate section.

Bug fixes

* When cloning a dataset, an artificial `id` attribute was created with the original dataset `id` in it.
* Setting an attribute to an invalid value would cause the dataset to disappear from the store.

Page 1 of 2

Releases

Has known vulnerabilities

Kosh

Page 1 of 2

3.2.0

3.1

3.0.1

3.0

2.2

2.1

Page 1 of 2

Links

Releases