Enhancements
- Added experimental support for generating geospatial data. You can now generate point geometry with latitude and longitude coordinates sampled from H3 hexagons.
- Additionally, you can create random, but geographically-valid regions to match the partitions in the data. This is done using a new custom action called geo_make_regions.
- You can now add noise to user linked column groups. Rather than mirroring the original relationships exactly, links can be formed between random column values based on a specified probability.
- Added a new custom action: generate_as_sequence. This action is useful when generated values much follow a specific order in a partition, like vaccine doses administered to an individual: "full schedule" before "booster". This is different from sorting because sorting happens after the data has been generated, whereas generate_as_sequence will ensure that "booster" is never generated by itself - only when preceded by "full schedule".
Bug fixes
- When generating missing data, there was a chance that missing values will be generated in the same rows for different columns rather than independently.
- Fixed a number of issues around nullable integer type.
Package version upgrades
- pandas bumped to 1.4.2.
- numpy bumped to 1.21.5.
- PyYAML bumped to 6.0.