Bytewax

Latest version: v0.20.0

Safety actively analyzes 632420 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.20.0

- Adds a dataflow structure visualizer. Run `python -m
bytewax.visualize`.

- *Breaking change* The internal format of recovery databases has been
changed from using `JsonPickle` to Python's built-in {py:obj}`pickle`.
Recovery stores that used the old format will not be usable after
upgrading.

- *Breaking change* The `unary` operator and `UnaryLogic` have been
renamed to `stateful` and `StatefulLogic` respectively.

- Adds a `stateful_batch` operator to allow for lower-level batch
control while managing state.

- `StatefulLogic.on_notify`, `StatefulLogic.on_eof`, and
`StatefulLogic.notify_at` are now optional overrides. The defaults
retain the state and emit nothing.

- *Breaking change* Windowing operators have been moved from
`bytewax.operators.window` into `bytewax.operators.windowing`.

- *Breaking change* `ClockConfig`s have had `Config` dropped from
their name and are just `Clock`s. E.g. If you previously `from
bytewax.operators.window import SystemClockConfig` now `from
bytewax.operators.windowing import SystemClock`.

- *Breaking change* `WindowConfig`s have been renamed to `Windower`s.
E.g. If you previously `from bytewax.operators.window import
SessionWindow` now `from bytewax.operators.windowing import
SessionWindower`.

- *Breaking change* All windowing operators now return a set of
streams {py:obj}`~bytewax.operators.windowing.WindowOut`.
{py:obj}`~bytewax.operators.windowing.WindowMetadata` now is
branched into its own stream and is no longer part of the single
downstream. All window operator emitted items are labeled with the
unique window ID they came from to facilitate joining the data
later.

- *Breaking change* {py:obj}`~bytewax.operators.windowing.fold_window`
now requires a `merge` argument. This handles whenever the session
windower determines that two windows must be merged because a new
item bridged a gap.

- *Breaking change* The `join_named` and `join_window_named` operators
have been removed because they did not support returning proper type
information. Use {py:obj}`~bytewax.operators.join` or
{py:obj}`~bytewax.operators.windowing.join_window` instead, which
have been enhanced to properly type their downstream values.

- *Breaking change* {py:obj}`~bytewax.operators.join` and
{py:obj}`~bytewax.operators.windowing.join_window` have had their
`product` argument replaced with `insert_mode`. You now can specify
more nuanced kinds of join modes.

- Python interfaces are now provided for custom clocks and windowers.
Subclass {py:obj}`~bytewax.operators.windowing.Clock` (and a
corresponding {py:obj}`~bytewax.operators.windowing.ClockLogic`) or
{py:obj}`~bytewax.operators.windowing.Windower` (and a corresponding
{py:obj}`~bytewax.operators.windowing.WindowerLogic`) to define your
own senses of time and window definitions.

- Adds a {py:obj}`~bytewax.operators.windowing.window` operator to
allow you to write more flexible custom windowing operators.

- Session windows now work correctly with out-of-order data and joins.

- {py:obj}`~bytewax.operators.windowing.WindowMetadata` now contains a
{py:obj}`~bytewax.operators.windowing.WindowMetadata.merged_ids`
field with any window IDs that were merged into this window.

- All windowing operators now process items in timestamp order. The
most visible change that this results in is that the
{py:obj}`~bytewax.operators.windowing.collect_window` operator now
emits collections with values in timestamp order.

- Adds a {py:obj}`~bytewax.operators.filter_map_value` operator.

- Adds a {py:obj}`~bytewax.operators.enrich_cached` operator for
easier joining with an external data source.

- Adds a {py:obj}`~bytewax.operators.key_rm` convenience operator to
remove keys from a {py:obj}`~bytewax.operators.KeyedStream`.

0.19.1

- Fixes a bug where using a system clock on certain architectures
causes items to be dropped from windows.

0.19.0

- Multiple operators have been reworked to avoid taking and releasing
Python's global interpreter lock while iterating over multiple items.
Windowing operators, stateful operators and operators like `branch`
will see significant performance improvements.

Thanks to damiondoesthings for helping us track this down!

- *Breaking change* `FixedPartitionedSource.build_part`,
`DynamicSource.build`, `FixedPartitionedSink.build_part` and `DynamicSink.build`
now take an additional `step_id` argument. This argument can be used when
labeling custom Python metrics.

- Custom Python metrics can now be collected using the `prometheus-client`
library.

- *Breaking change* The schema registry interface has been removed.
You can still use schema registries, but you need to instantiate
the (de)serializers on your own. This allows for more flexibility.
See the `confluent_serde` and `redpanda_serde` examples for how
to use the new interface.

- Fixes bug where items would be incorrectly marked as late in sliding
and tumbling windows in cases where the timestamps are very far from
the `align_to` parameter of the windower.
- Adds `stateful_flat_map` operator.

- *Breaking change* Removes `builder` argument from `stateful_map`.
Instead, the initial state value is always `None` and you can call
your previous builder by hand in the `mapper`.

- *Breaking change* Improves performance by removing the `now:
datetime` argument from `FixedPartitionedSource.build_part`,
`DynamicSource.build`, and `UnaryLogic.on_item`. If you need the
current time, use:

python
from datetime import datetime, timezone

now = datetime.now(timezone.utc)


- *Breaking change* Improves performance by removing the `sched:
datetime` argument from `StatefulSourcePartition.next_batch`,
`StatelessSourcePartition.next_batch`, `UnaryLogic.on_notify`. You
should already have the scheduled next awake time in whatever
instance variable you returned in
`{Stateful,Stateless}SourcePartition.next_awake` or
`UnaryLogic.notify_at`.

0.18.2

- Fixes a bug that prevented the deletion of old state in recovery stores.

- Better error messages on invalid epoch and backup interval
parameters.

- Fixes bug where dataflow will hang if a source's `next_awake` is set
far in the future.

0.18.1

- Changes the default batch size for `KafkaSource` from 1 to 1000 to match
the Kafka input operator.

- Fixes an issue with the `count_window` operator:
https://github.com/bytewax/bytewax/issues/364.

0.18.0

- Support for schema registries, through
`bytewax.connectors.kafka.registry.RedpandaSchemaRegistry` and
`bytewax.connectors.kafka.registry.ConfluentSchemaRegistry`.

- Custom Kafka operators in `bytewax.connectors.kafka.operators`:
`input`, `output`, `deserialize_key`, `deserialize_value`,
`deserialize`, `serialize_key`, `serialize_value` and `serialize`.

- *Breaking change* `KafkaSource` now emits a special
`KafkaSourceMessage` to allow access to all data on consumed
messages. `KafkaSink` now consumes `KafkaSinkMessage` to allow
setting additional fields on produced messages.

- Non-linear dataflows are now possible. Each operator method returns
a handle to the `Stream`s it produces; add further steps via calling
operator functions on those returned handles, not the root
`Dataflow`. See the migration guide for more info.

- Auto-complete and type hinting on operators, inputs, outputs,
streams, and logic functions now works.

- A ton of new operators: `collect_final`, `count_final`,
`count_window`, `flatten`, `inspect_debug`, `join`, `join_named`,
`max_final`, `max_window`, `merge`, `min_final`, `min_window`,
`key_on`, `key_assert`, `key_split`, `merge`, `unary`. Documentation
for all operators are in `bytewax.operators` now.

- New operators can be added in Python, made by grouping existing
operators. See `bytewax.dataflow` module docstring for more info.

- *Breaking change* Operators are now stand-alone functions; `import
bytewax.operators as op` and use e.g. `op.map("step_id", upstream,
lambda x: x + 1)`.

- *Breaking change* All operators must take a `step_id` argument now.

- *Breaking change* `fold` and `reduce` operators have been renamed to
`fold_final` and `reduce_final`. They now only emit on EOF and are
only for use in batch contexts.

- *Breaking change* `batch` operator renamed to `collect`, so as to
not be confused with runtime batching. Behavior is unchanged.

- *Breaking change* `output` operator does not forward downstream its
items. Add operators on the upstream handle instead.

- `next_batch` on input partitions can now return any `Iterable`, not
just a `List`.

- `inspect` operator now has a default inspector that prints out items
with the step ID.

- `collect_window` operator now can collect into `set`s and `dict`s.

- Adds a `get_fs_id` argument to `{Dir,File}Source` to allow handling
non-identical files per worker.

- Adds a `TestingSource.EOF` and `TestingSource.ABORT` sentinel values
you can use to test recovery.

- *Breaking change* Adds a `datetime` argument to
`FixedPartitionSource.build_part`, `DynamicSource.build_part`,
`StatefulSourcePartition.next_batch`, and
`StatelessSourcePartition.next_batch`. You can now use this to
update your `next_awake` time easily.

- *Breaking change* Window operators now emit `WindowMetadata` objects
downstream. These objects can be used to introspect the open_time
and close_time of windows. This changes the output type of windowing
operators from: `(key, values)` to `(key, (metadata, values))`.

- *Breaking change* IO classes and connectors have been renamed to
better reflect their semantics and match up with documentation.

- Moves the ability to start multiple Python processes with the
`-p` or `--processes` to the `bytewax.testing` module.

- *Breaking change* `SimplePollingSource` moved from
`bytewax.connectors.periodic` to `bytewax.inputs` since it is an
input helper.

- `SimplePollingSource`'s `align_to` argument now works.

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.