Bytewax

Latest version: v0.21.0

Safety actively analyzes 682387 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.21.0

- {py:obj}`~bytewax.inputs.SimplePollingSource` now allows you to
retain state to support at-least-once delivery.

- Fixes a bug when using
{py:obj}`~bytewax.operators.windowing.SlidingWindower` where values
would be assigned to an extra window if their timestamps were near
the end of the correct window.

- Upstream type hints on
{py:obj}`bytewax.connectors.kafka.operators.serialize_key`,
{py:obj}`~bytewax.connectors.kafka.operators.serialize_value`, and
{py:obj}`~bytewax.connectors.kafka.operators.serialize` have been
made more broad to support all Kafka serializers, like
{py:obj}`confluent_kafka.serialization.StringSerializer`.

- *Breaking change* - Fixes a bug which caused two of the same types
of windowing operators in a dataflow to spuriously result in a
`ValueError`. This fix invalidates any recovery data for all
windowing operators; it is recommended to delete and re-create
the recovery store if you are using windowing operators.

- Fixes a performance issue where
{py:obj}`bytewax.operators.StatefulBatchLogic.notify_at` (and thus
many of the other stateful operators' `notify_at` derived from it)
was being called superfluously.

0.20.1

- Fixes a bug when using
{py:obj}`~bytewax.operators.windowing.EventClock` where in-order but
"slow" data results in watermark assertion errors.

0.20.0

- Adds a dataflow structure visualizer. Run `python -m
bytewax.visualize`.

- *Breaking change* The internal format of recovery databases has been
changed from using `JsonPickle` to Python's built-in {py:obj}`pickle`.
Recovery stores that used the old format will not be usable after
upgrading.

- *Breaking change* The `unary` operator and `UnaryLogic` have been
renamed to `stateful` and `StatefulLogic` respectively.

- Adds a `stateful_batch` operator to allow for lower-level batch
control while managing state.

- `StatefulLogic.on_notify`, `StatefulLogic.on_eof`, and
`StatefulLogic.notify_at` are now optional overrides. The defaults
retain the state and emit nothing.

- *Breaking change* Windowing operators have been moved from
`bytewax.operators.window` into `bytewax.operators.windowing`.

- *Breaking change* `ClockConfig`s have had `Config` dropped from
their name and are just `Clock`s. E.g. If you previously `from
bytewax.operators.window import SystemClockConfig` now `from
bytewax.operators.windowing import SystemClock`.

- *Breaking change* `WindowConfig`s have been renamed to `Windower`s.
E.g. If you previously `from bytewax.operators.window import
SessionWindow` now `from bytewax.operators.windowing import
SessionWindower`.

- *Breaking change* All windowing operators now return a set of
streams {py:obj}`~bytewax.operators.windowing.WindowOut`.
{py:obj}`~bytewax.operators.windowing.WindowMetadata` now is
branched into its own stream and is no longer part of the single
downstream. All window operator emitted items are labeled with the
unique window ID they came from to facilitate joining the data
later.

- *Breaking change* {py:obj}`~bytewax.operators.windowing.fold_window`
now requires a `merge` argument. This handles whenever the session
windower determines that two windows must be merged because a new
item bridged a gap.

- *Breaking change* The `join_named` and `join_window_named` operators
have been removed because they did not support returning proper type
information. Use {py:obj}`~bytewax.operators.join` or
{py:obj}`~bytewax.operators.windowing.join_window` instead, which
have been enhanced to properly type their downstream values.

- *Breaking change* {py:obj}`~bytewax.operators.join` and
{py:obj}`~bytewax.operators.windowing.join_window` have had their
`product` argument replaced with `insert_mode`. You now can specify
more nuanced kinds of join modes.

- Python interfaces are now provided for custom clocks and windowers.
Subclass {py:obj}`~bytewax.operators.windowing.Clock` (and a
corresponding {py:obj}`~bytewax.operators.windowing.ClockLogic`) or
{py:obj}`~bytewax.operators.windowing.Windower` (and a corresponding
{py:obj}`~bytewax.operators.windowing.WindowerLogic`) to define your
own senses of time and window definitions.

- Adds a {py:obj}`~bytewax.operators.windowing.window` operator to
allow you to write more flexible custom windowing operators.

- Session windows now work correctly with out-of-order data and joins.

- {py:obj}`~bytewax.operators.windowing.WindowMetadata` now contains a
{py:obj}`~bytewax.operators.windowing.WindowMetadata.merged_ids`
field with any window IDs that were merged into this window.

- All windowing operators now process items in timestamp order. The
most visible change that this results in is that the
{py:obj}`~bytewax.operators.windowing.collect_window` operator now
emits collections with values in timestamp order.

- Adds a {py:obj}`~bytewax.operators.filter_map_value` operator.

- Adds a {py:obj}`~bytewax.operators.enrich_cached` operator for
easier joining with an external data source.

- Adds a {py:obj}`~bytewax.operators.key_rm` convenience operator to
remove keys from a {py:obj}`~bytewax.operators.KeyedStream`.

0.19.1

- Fixes a bug where using a system clock on certain architectures
causes items to be dropped from windows.

0.19.0

- Multiple operators have been reworked to avoid taking and releasing
Python's global interpreter lock while iterating over multiple items.
Windowing operators, stateful operators and operators like `branch`
will see significant performance improvements.

Thanks to damiondoesthings for helping us track this down!

- *Breaking change* `FixedPartitionedSource.build_part`,
`DynamicSource.build`, `FixedPartitionedSink.build_part` and `DynamicSink.build`
now take an additional `step_id` argument. This argument can be used when
labeling custom Python metrics.

- Custom Python metrics can now be collected using the `prometheus-client`
library.

- *Breaking change* The schema registry interface has been removed.
You can still use schema registries, but you need to instantiate
the (de)serializers on your own. This allows for more flexibility.
See the `confluent_serde` and `redpanda_serde` examples for how
to use the new interface.

- Fixes bug where items would be incorrectly marked as late in sliding
and tumbling windows in cases where the timestamps are very far from
the `align_to` parameter of the windower.
- Adds `stateful_flat_map` operator.

- *Breaking change* Removes `builder` argument from `stateful_map`.
Instead, the initial state value is always `None` and you can call
your previous builder by hand in the `mapper`.

- *Breaking change* Improves performance by removing the `now:
datetime` argument from `FixedPartitionedSource.build_part`,
`DynamicSource.build`, and `UnaryLogic.on_item`. If you need the
current time, use:

python
from datetime import datetime, timezone

now = datetime.now(timezone.utc)


- *Breaking change* Improves performance by removing the `sched:
datetime` argument from `StatefulSourcePartition.next_batch`,
`StatelessSourcePartition.next_batch`, `UnaryLogic.on_notify`. You
should already have the scheduled next awake time in whatever
instance variable you returned in
`{Stateful,Stateless}SourcePartition.next_awake` or
`UnaryLogic.notify_at`.

0.18.2

- Fixes a bug that prevented the deletion of old state in recovery stores.

- Better error messages on invalid epoch and backup interval
parameters.

- Fixes bug where dataflow will hang if a source's `next_awake` is set
far in the future.

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.