Pathway

Latest version: v0.21.1

Safety actively analyzes 723177 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 11

0.21.1

Changed
- Input connectors now throttle parsing error messages if their share is more than 10% of the parsing attempts.
- New flag `return_status` for `inputs_query` method in `pw.xpacks.llm.DocumentStore`. If set to True, DocumentStore returns the status of indexing for each file.

0.21.0

Added
- All Pathway types can now be serialized to CSV using `pw.io.csv.write` and deserialized back using `pw.io.csv.read`.
- `pw.io.csv.read` now parses null-values in data when it can be done unambiguously.

Changed
- **BREAKING**: Updated endpoints in `pw.xpacks.llm.question_answering.BaseRAGQuestionAnswerer`:
- Deprecated: `/v1/pw_list_documents`, `/v1/pw_ai_answer`
- New: `/v2/list_documents`, `/v2/answer`
- RAG methods under the `pw.xpacks.llm.question_answering.RAGClient` are re-named, and they now use the new endpoints. Old methods are deprecated and will be removed in the future.
- `pw_ai_summary` -> `summarize`
- `pw_ai_answer` -> `answer`
- `pw_list_documents` -> `list_documents`
- When `pw.io.deltalake.write` creates a table, it also stores its metadata in the columns of the created Delta table. This metadata can be used by Pathway when reading the table with `pw.io.deltalake.read` if no `schema` is specified.
- The `schema` parameter is now optional for `pw.io.deltalake.read`. If the table was created by Pathway and the `schema` was not specified by user, it is read from the table metadata.
- `pw.io.deltalake.write` now aligns the output metadata with the existing table's metadata, preserving any custom metadata in the sink.
- **BREAKING**: The `Bytes` type is now serialized and deserialized with base64 encoding and decoding when the CSV format is used.
- **BREAKING**: The `Duration` type is now serialized and deserialized as a number of nanoseconds when the CSV format is used.
- **BREAKING**: The `tuple` and `np.ndarray` types are now serialized and deserialized as their JSON representations when the CSV format is used.

Fixed
- `pw.io.csv.write` now correctly escapes quote characters.
- `table_parsing_strategy="llm"` in `DoclingParser` now works correctly

0.20.1

Added
- Added `RecursiveSplitter`
- `pw.io.deltalake.write` now checks that the schema of the target table Delta Table corresponds to the schema of the Pathway table that is sent for the output. If the schemas differ, a human-readable error message is produced.

0.20.0

Added
- Added structure-aware chunking for `DoclingParser`.
- Added `table_parsing_strategy` for `DoclingParser`.
- Column expressions `as_int()`, `as_float()`, `as_str()`, and `as_bool()` now accept additional arguments, `unwrap` and `default`, to simplify null handling.
- Support for python tuples in expressions.

Changed
- **BREAKING**: Changed the argument in `DoclingParser` from `parse_images` (bool) into `image_parsing_strategy` (Literal["llm"] | None).
- **BREAKING**: `doc_post_processors` argument in the `pw.xpacks.llm.document_store.DocumentStore` now longer accepts `pw.UDF`.
- Better error messages when using `pathway spawn` with multiple workers. Now error messages are printed only from the worker experiencing the error directly.

Fixed
- `doc_post_processors` argument in the `pw.xpacks.llm.document_store.DocumentStore` had no effect. This is now fixed.

0.19.0

Added
- `LLMReranker` now supports custom prompts as well as custom response parsers allowing for other ranking scales apart from default 1-5.
- `pw.io.kafka.write` and `pw.io.nats.write` now support `ColumnReference` as a topic name. When a `ColumnReference` is provided, each message's topic is determined by the corresponding column value.
- `pw.io.python.write` accepting `ConnectorObserver` as an alternative to `pw.io.subscribe`.
- `pw.io.iceberg.read` and `pw.io.iceberg.write` now support S3 as data backend and AWS Glue catalog implementations.
- All output connectors now support the `sort_by` field for ordering output within a single minibatch.
- A new UDF executor `pw.udfs.fully_async_executor`. It allows for creation of non-blocking asynchronous UDFs which results can be returned in the future processing time.
- A Future data type to represent results of fully asynchronous UDFs.
- `pw.Table.await_futures` method to wait for results of fully asynchronous UDFs.
- `pw.io.deltalake.write` now supports partition columns specification.

Changed
- **BREAKING**: Changed the interface of `LLMReranker`, the `use_logit_bias`, `cache_strategy`, `retry_strategy` and `kwargs` arguments are no longer supported.
- **BREAKING**: LLMReranker no longer inherits from pw.UDF
- **BREAKING**: `pw.stdlib.utils.AsyncTransformer.output_table` now returns a table with columns with Future data type.
- `pw.io.deltalake.read` can now read append-only tables without requiring explicit specification of primary key fields.

0.18.0

Added
- `pw.io.postgres.write` and `pw.io.postgres.write_snapshot` now handle serialization of `PyObjectWrapper` and `Timedelta` properly.
- New chunking options in `pathway.xpacks.llm.parsers.UnstructuredParser`
- Now all Pathway types can be serialized into JSON and consistently deserialized back.
- `table.col.dt.to_duration` converting an integer into a `pw.Duration`.
- `pw.Json` now supports storing datetime and duration type values in ISO format.

Changed
- **BREAKING**: Changed the interface of `UnstructuredParser`
- **BREAKING**: The `Pointer` type is now serialized and deserialized as a string field in Iceberg and Delta Lake.
- **BREAKING**: The `Bytes` type is now serialized and deserialized with base64 encoding and decoding when the JSON format is used. A string field is used to store the encoded contents.
- **BREAKING**: The `Array` type is now serialized and deserialized as an object with two fields: `shape` denoting the shape of the stored multi-dimensional array and `elements` denoting the elements of the flattened array.
- **BREAKING**: Marked package as **py.typed** to indicate support for type hints.

Removed
- **BREAKING**: Removed undocumented `license_key` argument from `pw.run` and `pw.run_all` methods. Instead, `pw.set_license_key` should be used.

Page 1 of 11

Releases

Has known vulnerabilities

Pathway

Page 1 of 11

0.21.1

0.21.0

0.20.1

0.20.0

0.19.0

0.18.0

Page 1 of 11

Links

Releases