Bytewax

Latest version: v0.20.1

Safety actively analyzes 638763 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 4

0.18.0

- Support for schema registries, through
`bytewax.connectors.kafka.registry.RedpandaSchemaRegistry` and
`bytewax.connectors.kafka.registry.ConfluentSchemaRegistry`.

- Custom Kafka operators in `bytewax.connectors.kafka.operators`:
`input`, `output`, `deserialize_key`, `deserialize_value`,
`deserialize`, `serialize_key`, `serialize_value` and `serialize`.

- *Breaking change* `KafkaSource` now emits a special
`KafkaSourceMessage` to allow access to all data on consumed
messages. `KafkaSink` now consumes `KafkaSinkMessage` to allow
setting additional fields on produced messages.

- Non-linear dataflows are now possible. Each operator method returns
a handle to the `Stream`s it produces; add further steps via calling
operator functions on those returned handles, not the root
`Dataflow`. See the migration guide for more info.

- Auto-complete and type hinting on operators, inputs, outputs,
streams, and logic functions now works.

- A ton of new operators: `collect_final`, `count_final`,
`count_window`, `flatten`, `inspect_debug`, `join`, `join_named`,
`max_final`, `max_window`, `merge`, `min_final`, `min_window`,
`key_on`, `key_assert`, `key_split`, `merge`, `unary`. Documentation
for all operators are in `bytewax.operators` now.

- New operators can be added in Python, made by grouping existing
operators. See `bytewax.dataflow` module docstring for more info.

- *Breaking change* Operators are now stand-alone functions; `import
bytewax.operators as op` and use e.g. `op.map("step_id", upstream,
lambda x: x + 1)`.

- *Breaking change* All operators must take a `step_id` argument now.

- *Breaking change* `fold` and `reduce` operators have been renamed to
`fold_final` and `reduce_final`. They now only emit on EOF and are
only for use in batch contexts.

- *Breaking change* `batch` operator renamed to `collect`, so as to
not be confused with runtime batching. Behavior is unchanged.

- *Breaking change* `output` operator does not forward downstream its
items. Add operators on the upstream handle instead.

- `next_batch` on input partitions can now return any `Iterable`, not
just a `List`.

- `inspect` operator now has a default inspector that prints out items
with the step ID.

- `collect_window` operator now can collect into `set`s and `dict`s.

- Adds a `get_fs_id` argument to `{Dir,File}Source` to allow handling
non-identical files per worker.

- Adds a `TestingSource.EOF` and `TestingSource.ABORT` sentinel values
you can use to test recovery.

- *Breaking change* Adds a `datetime` argument to
`FixedPartitionSource.build_part`, `DynamicSource.build_part`,
`StatefulSourcePartition.next_batch`, and
`StatelessSourcePartition.next_batch`. You can now use this to
update your `next_awake` time easily.

- *Breaking change* Window operators now emit `WindowMetadata` objects
downstream. These objects can be used to introspect the open_time
and close_time of windows. This changes the output type of windowing
operators from: `(key, values)` to `(key, (metadata, values))`.

- *Breaking change* IO classes and connectors have been renamed to
better reflect their semantics and match up with documentation.

- Moves the ability to start multiple Python processes with the
`-p` or `--processes` to the `bytewax.testing` module.

- *Breaking change* `SimplePollingSource` moved from
`bytewax.connectors.periodic` to `bytewax.inputs` since it is an
input helper.

- `SimplePollingSource`'s `align_to` argument now works.

0.17.1

- Adds the `batch` operator to Dataflows. Calling `Dataflow.batch`
will batch incoming items until either a batch size has been reached
or a timeout has passed.

- Adds the `SimplePollingInput` source. Subclass this input source to
periodically source new input for a dataflow.

- Re-adds GLIBC 2.27 builds to support older linux distributions.

0.17.0

Changed

- *Breaking change* Recovery system re-worked. Kafka-based recovery
removed. SQLite recovery file format changed; existing recovery DB
files can not be used. See the module docstring for
`bytewax.recovery` for how to use the new recovery system.

- Dataflow execution supports rescaling over resumes. You can now
change the number of workers and still get proper execution and
recovery.

- `epoch-interval` has been renamed to `snapshot-interval`

- The `list-parts` method of `PartitionedInput` has been changed to
return a `List[str]` and should only reflect the available
inputs that a given worker has access to. You no longer need
to return the complete set of partitions for all workers.

- The `next` method of `StatefulSource` and `StatelessSource` has
been changed to `next_batch` and should return a `List` of elements,
or the empty list if there are no elements to return.

Added

- Added new cli parameter `backup-interval`, to configure the length of
time to wait before "garbage collecting" older recovery snapshots.

- Added `next_awake` to input classes, which can be used to schedule
when the next call to `next_batch` should occur. Use `next_awake`
instead of `time.sleep`.

- Added `bytewax.inputs.batcher_async` to bridge async Python libraries
in Bytewax input sources.

- Added support for linux/aarch64 and linux/armv7 platforms.

Removed

- `KafkaRecoveryConfig` has been removed as a recovery store.

0.16.2

- Add support for Windows builds - thanks zzl221000!
- Adds a CSVInput subclass of FileInput

0.16.1

- Add a cooldown for activating workers to reduce CPU consumption.
- Add support for Python 3.11.

0.16.0

- *Breaking change* Reworked the execution model. `run_main` and `cluster_main`
have been moved to `bytewax.testing` as they are only supposed to be used
when testing or prototyping.
Production dataflows should be ran by calling the `bytewax.run`
module with `python -m bytewax.run <dataflow-path>:<dataflow-name>`.
See `python -m bytewax.run -h` for all the possible options.
The functionality offered by `spawn_cluster` are now only offered by the
`bytewax.run` script, so `spawn_cluster` was removed.

- *Breaking change* `{Sliding,Tumbling}Window.start_at` has been
renamed to `align_to` and both now require that argument. It's not
possible to recover windowing operators without it.

- Fixes bugs with windows not closing properly.

- Fixes an issue with SQLite-based recovery. Previously you'd always
get an "interleaved executions" panic whenever you resumed a cluster
after the first time.

- Add `SessionWindow` for windowing operators.

- Add `SlidingWindow` for windowing operators.

- *Breaking change* Rename `TumblingWindowConfig` to `TumblingWindow`

- Add `filter_map` operator.

- *Breaking change* New partition-based input and output API. This
removes `ManualInputConfig` and `ManualOutputConfig`. See
`bytewax.inputs` and `bytewax.outputs` for more info.

- *Breaking change* `Dataflow.capture` operator is renamed to
`Dataflow.output`.

- *Breaking change* `KafkaInputConfig` and `KafkaOutputConfig` have
been moved to `bytewax.connectors.kafka.KafkaInput` and
`bytewax.connectors.kafka.KafkaOutput`.

- *Deprecation warning* The `KafkaRecovery` store is being deprecated
in favor of `SqliteRecoveryConfig`, and will be removed in a future
release.

Page 2 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.