Added
- `pw.io.iceberg.read` method for reading Apache Iceberg tables into Pathway.
- methods `pw.io.postgres.write` and `pw.io.postgres.write_snapshot` now accept an additional argument `init_mode`, which allows initializing the table before writing.
- `pw.io.deltalake.read` now supports serialization and deserialization for all Pathway data types.
- New parser `pathway.xpacks.llm.parsers.DoclingParser` supporting parsing of pdfs with tables and images.
- Output connectors now include an optional `name` parameter. If provided, this name will appear in logs and monitoring dashboards.
- Automatic naming for input and output connectors has been enhanced.
Changed
- **BREAKING**: `pw.io.deltalake.read` now requires explicit specification of primary key fields.
- **BREAKING**: `pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer` now returns a dictionary from `pw_ai_answer` endpoint.
- `pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer` allows optionally returning context documents from `pw_ai_answer` endpoint.
- **BREAKING**: When using delay in temporal behavior, current time is updated immediately, not in the next batch.
- **BREAKING**: The `Pointer` type is now serialized to Delta Tables as raw bytes.
- `pw.io.kafka.write` now allows to specify `key` and `headers` for JSON and CSV data formats.
- `persistent_id` parameter in connectors has been renamed to `name`. This new `name` parameter allows you to assign names to connectors, which will appear in logs and monitoring dashboards.
- Changed names of parsers to be more consistent: `ParseUnstrutured` -> `UnstructuredParser`, `ParseUtf8` -> `Utf8Parser`. `ParseUnstrutured` and `ParseUtf8` are now deprecated.
Fixed
- `generate_class` method in `Schema` now correctly renders columns of `UnionType` and `None` types.
- a bug in delay in temporal behavior. It was possible to emit a single entry twice in a specific situation.
- `pw.io.postgres.write_snapshot` now correctly handles tables that only have primary key columns.
Removed
- **BREAKING**: `pw.indexing.build_sorted_index`, `pw.indexing.retrieve_prev_next_values`, `pw.indexing.sort_from_index` and `pw.indexing.SortedIndex` are removed. Sorting is now done with `pw.Table.sort`.
- **BREAKING**: Removed deprecated methods `pw.Table.unsafe_promise_same_universe_as`, `pw.Table.unsafe_promise_universes_are_pairwise_disjoint`, `pw.Table.unsafe_promise_universe_is_subset_of`, `pw.Table.left_join`, `pw.Table.right_join`, `pw.Table.outer_join`, `pw.stdlib.utils.AsyncTransformer.result`.
- **BREAKING**: Removed deprecated column `_pw_shard` in the result of `windowby`.
- **BREAKING**: Removed deprecated functions `pw.debug.parse_to_table`, `pw.udf_async`, `pw.reducers.npsum`, `pw.reducers.int_sum`, `pw.stdlib.utils.col.flatten_column`.
- **BREAKING**: Removed deprecated module `pw.asynchronous`.
- **BREAKING**: Removed deprecated access to functions from `pw.io` in `pw`.
- **BREAKING**: Removed deprecated classes `pw.UDFSync`, `pw.UDFAsync`.
- **BREAKING**: Removed class `pw.xpack.llm.parsers.OpenParse`. It's functionality has been replaced with `pw.xpack.llm.parsers.DoclingParser`.
- **BREAKING**: Removed deprecated arguments from input connectors: `value_columns`, `primary_key`, `types`, `default_values`. Schema should be used instead.