Pasteur

Latest version: v0.3.1

Safety actively analyzes 623092 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

0.3.0

This pasteur release tweaks pipeline generation to better segment ingestion and synthesis.

It introduces the new commands `ingest_dataset` (or `id`) and `ingest_view` (or `iv`) which only perform the dataset and view ingest steps. This makes it easier to iterate on creating new datasets and new views by only re-running their ingest code.

Now by default `pipe` won't perform the view ingestion steps, which may be cumbersome for out-of-core datasets, and will begin from filtering onward (`pipe --all` will still run the whole pipeline).

A new view option is introduced: `fit_global`, which allows for fitting the transformers and encoders in the whole view (at the cost of increased overhead), which fixes issues with rare categorical values not being recognized due to be missing from the work set.

Two bugs were also fixed: `TabularDataset` required pandas but it wasn't imported and the mlflow default style was not packaged in the pypi package.

0.2.0

This new release overhauls and standardizes Pasteur's API to prepare it for multi-modal data synthesis. In addition, it fixes some of its rough parts, by making the process of fitting Encodings, Transformations, and Metrics out-of-core through a map-reduce architecture.

For transforming event data, a new type of Transformer, Seq(uence) Transformer is added. This transformer is multi-table aware and can, for example, encode inter-row references (such as a date of 3 row for patient X having a dependency on 2 row). A built-in implementation of this transformer, named SeqTransformerWrapper (accessed through the name `seq`), contains the necessary joining logic to wrap existing reference transformers into supporting this format.

The new `mimic_core` view in extras is provided as a proof of concept for this new transformation format, which contains the three core tables of mimic (patients, admissions, and transfers).

0.1.1

This version correctly packages `.yml` and template files in the pypi package.

0.1.0

Initial release for pasteur. Pasteur can now be installed with `pip` and offers a working template for data synthesis.

Links

Releases

Has known vulnerabilities

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.