InsuranceLake Version 4.0 includes:
- Iceberg support in entity matching Glue job
- Partition override support via Collect bucket folder path
- Evolve schema control option fully implemented
- Quarantine partitions/catalog entries are now created and cleared on each job execution making it possible to use quarantine tables as targets before quarantining any rows, and to rerun executions to correct quarantined data
- Transforms can be run more than once using an optional suffix (thank you to Paolo Rezzano for the suggestion and code snippet)
- Merge transform supports handling empty strings as nulls
- Lambda function version bump to 3.12 and removal of aws-cdk-lib version constraint
- Custom Lineage module performance improvements
- Introduce data lineage to entity match Glue job
- Enable unit testing with Iceberg in entity match Glue job, and provide documentation
- 80% unit test coverage
- Documentation updates include: Glue Data Quality AI recommendations, Guidance for multi-file/nested folder Parquet files, Guidance for encoding issues in Fixed Width files, Guidance on writing performant Spark transforms, Schema evolution support details, Guidance on loading data for the first time, Improved Code Security information
- FIX: Clear partitions correctly (reverts previous release, and improves)
- FIX: Increase allowed Lambda function execution time