Exhibit

Latest version: v0.9.8

Safety actively analyzes 623704 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 2

0.9.8

Enhancements
- Experimental support for using SQL to generate anonymising sets of values. This feature is available for all column types except numerical.
- `make_distinct` custom action now works on date columns.
- You can now easily add a column with current date and time by using `'sysdate'` as a derived column.
- Pseudo-CHI numbers can be now generated by passing `pseudo_chi` as anonymising set to UUID columns.
- Numerical column weights for categorical values are now optional. This should speed up the process of manually composing a specification.

Bug fixes
- Minor bugs fixed in `shift_distribution`, `make_outlier` and `make_distinct`.
- Fixed a bug in regex distribution where the target number of uniques wasn't respected.
- Fixed date column not being recognized if source data had missing values.

Package version upgrades
- Python version changes to 3.10
- Pandas updated to 2.x version

0.9.7

Enhancements
- Using Exhibit as an importable library is now easier. Please see the scripting recipe for more details and examples.
- `anon.db` is now called `exhibit.db`. You can also now use 3rd party databases to store associated specification / aliasing data, as long as you have the required SQL Alchemy dialect installed. Set `EXHIBIT_DB_SCHEMA` and `EXHIBIT_DB_URL` environment variables and Exhibit will use those instead of the local `exhibit.db`.
- New custom action & filter pairs: `shift_distribution_right / left` and `COLUMN_NAME with_high / low_frequency`
- You can now save probabilities for columns you marked as linked in the CLI.
- UUID columns can now be generated using incrementing integer values by setting `anonymising_set` to `range`. You can also set different seeds for each UUID column.

Bug fixes
- Improved the calculation of weights for numerical columns.
- Various other minor bug fixes and improvements to error messages.

Package version upgrades
- added `scipy` and `sqlalchemy` as dependencies.

0.9.6

Enhancements
- Added experimental support for using pickled machine learning models as plug-ins. See the `Create Exhibit-compatible ML model.ipynb` recipe for details.
- Added an option to save probabilities of values in columns that are put into the DB.
- Added performance and memory benchmarking.
- You can now reference custom lookups you added to the DB directly in the specification as long as specification columns and DB columns match.
- Added a `make_almost_same` custom action.

Bug fixes
- Fixed a bug where generating a dataset without any categorical columns would give an error.
- Fixed a DB bug that gave missing data an equal chance to appear for columns where number of uniques exceeded the in-line limit.

Package version upgrades
- added `dill` as a dependency.

0.9.5

Enhancements
- Added 4 new custom actions to manipulate timeseries: given a numerical column and a timeseries column, create artificial skew (left or right) or add peak / valley.
- `generate_as_sequence` custom action has a new variant that lets you generate repeated sequences of values in the order that they appear in the spec, regardless of the probability vector.
- You can now apply a single custom action to multiple columns by providing them as a comma-separated target string. The same applies to actions. The processing of custom constraints happens in the order in which column names / actions were specified.

Bug fixes
- Fixed an issue where custom constraints wouldn't always respect original column types (float or Int64).
- Fixed an issue where column values generated from a regular expression pattern were inadvertently repeated under certain conditions.
- Fixed a bug with missing values in user linked columns.
- Fixed a bug that could result in linked column groups being in different order when re-running the generation of the same specification.

Package version upgrades
- numpy bumped to 1.22.

0.9.4

Enhancements
- Added experimental support for generating geospatial data. You can now generate point geometry with latitude and longitude coordinates sampled from H3 hexagons.
- Additionally, you can create random, but geographically-valid regions to match the partitions in the data. This is done using a new custom action called geo_make_regions.
- You can now add noise to user linked column groups. Rather than mirroring the original relationships exactly, links can be formed between random column values based on a specified probability.
- Added a new custom action: generate_as_sequence. This action is useful when generated values much follow a specific order in a partition, like vaccine doses administered to an individual: "full schedule" before "booster". This is different from sorting because sorting happens after the data has been generated, whereas generate_as_sequence will ensure that "booster" is never generated by itself - only when preceded by "full schedule".

Bug fixes
- When generating missing data, there was a chance that missing values will be generated in the same rows for different columns rather than independently.
- Fixed a number of issues around nullable integer type.

Package version upgrades
- pandas bumped to 1.4.2.
- numpy bumped to 1.21.5.
- PyYAML bumped to 6.0.

0.9.3

Bug fixes
- When asking Exhibit to generate a specification from a dataset that didn't contain any numerical columns, the resulting specification was missing probability information for categorical columns below the in-line limit.

Enhancements
- Revised the specification of custom constraints (previously called conditional constraints). Now you can specify the subset (filter) of the data, the partition, and one or more columns to be affected by the constraints. In addition to the `make_null`, `make_not_null` and `make_outlier`, there are 4 new constraints available: `make_same`, `make_distinct`, `sort_ascending` and `sort_descending`.
- Added an option to generate uuid columns. If your source dataset includes record-level data with unique identifiers, you can exclude them from processing and generate them separately. The uuid columns work differently to normal categorical columns in that you specify the probabilities of the frequency of each unique value appearing in your synthetic dataset. See `uuid_demo.yml` and `uuid_anon.csv` files for examples.
- Added an option to designate numerical columns like age or dose number as categorical. List such columns after the --discrete_columns flag in CLI.

Page 1 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.