Aws-insurancelake-etl

Latest version: v4.1.3

Safety actively analyzes 682441 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 2

4.0

InsuranceLake Version 4.0 includes:
- Iceberg support in entity matching Glue job
- Partition override support via Collect bucket folder path
- Evolve schema control option fully implemented
- Quarantine partitions/catalog entries are now created and cleared on each job execution making it possible to use quarantine tables as targets before quarantining any rows, and to rerun executions to correct quarantined data
- Transforms can be run more than once using an optional suffix (thank you to Paolo Rezzano for the suggestion and code snippet)
- Merge transform supports handling empty strings as nulls
- Lambda function version bump to 3.12 and removal of aws-cdk-lib version constraint
- Custom Lineage module performance improvements
- Introduce data lineage to entity match Glue job
- Enable unit testing with Iceberg in entity match Glue job, and provide documentation
- 80% unit test coverage
- Documentation updates include: Glue Data Quality AI recommendations, Guidance for multi-file/nested folder Parquet files, Guidance for encoding issues in Fixed Width files, Guidance on writing performant Spark transforms, Schema evolution support details, Guidance on loading data for the first time, Improved Code Security information
- FIX: Clear partitions correctly (reverts previous release, and improves)
- FIX: Increase allowed Lambda function execution time

3.3.1

Documentation updates

3.3

New v3.3 includes:
- Using SQL documentation
- Documentation on using data freshness checks to manage workflow dependencies (part of How to Manage Data Quality)
- FIX: Overwrite cleanse bucket partitions more effectively
- FIX: After-SparkSQL quarantine rules properly quarantine rows

3.2

New v3.2 release includes:
- JSON format support and structured data transforms
- Schema mapping documentation
- File format documentation
- Other documentation and code improvements
- FIX: Consistent use of job_last_updated field in job audit table
- FIX: Data security transforms halt on missing field
- FIX: Schema mapping recommendation file has a header
- FIX: Clean columns supports removing carriage returns

3.1

New v3.1 includes:
- Improved performance of filldown transform
- Introduced rownumber transform
- Improved lookup transform to support global lookups
- Improved currency transform to handle non-string columns
- Improved etl_cleanup script to support more options
- FIX: Documentation of changetype transform

3.0

New version includes:
- New Documentation: Transform reference, Data Quality reference
- Glue Data Quality available at 3 stages of the data pipeline
- New transforms: columnreplace, multiplycolumns, filldown, changetype
- Codestar connection support
- Entity matching and Hudi table support
- FIX: CDK-nag Python 3.12 error
- FIX: Glue Catalog integration handling of schema evolution

Page 2 of 2

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.