Bytewax

Latest version: v0.19.1

Safety actively analyzes 623936 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 4

0.19.1

- Fixes a bug where using a system clock on certain architectures causes items to be dropped from windows.

0.19.0

- Multiple operators have been reworked to avoid taking and releasing
Python's global interpreter lock while iterating over multiple items.
Windowing operators, stateful operators and operators like `branch`
will see significant performance improvements.

Thanks to damiondoesthings for helping us track this down!

- *Breaking change* `FixedPartitionedSource.build_part`,
`DynamicSource.build`, `FixedPartitionedSink.build_part` and `DynamicSink.build`
now take an additional `step_id` argument. This argument can be used when
labeling custom Python metrics.

- Custom Python metrics can now be collected using the `prometheus-client`
library.

- *Breaking change* The schema registry interface has been removed.
You can still use schema registries, but you need to instantiate
the (de)serializers on your own. This allows for more flexibility.
See the `confluent_serde` and `redpanda_serde` examples for how
to use the new interface.

- Fixes bug where items would be incorrectly marked as late in sliding
and tumbling windows in cases where the timestamps are very far from
the `align_to` parameter of the windower.
- Adds `stateful_flat_map` operator.

- *Breaking change* Removes `builder` argument from `stateful_map`.
Instead, the initial state value is always `None` and you can call
your previous builder by hand in the `mapper`.

- *Breaking change* Improves performance by removing the `now:
datetime` argument from `FixedPartitionedSource.build_part`,
`DynamicSource.build`, and `UnaryLogic.on_item`. If you need the
current time, use:

python
from datetime import datetime, timezone

now = datetime.now(timezone.utc)


- *Breaking change* Improves performance by removing the `sched:
datetime` argument from `StatefulSourcePartition.next_batch`,
`StatelessSourcePartition.next_batch`, `UnaryLogic.on_notify`. You
should already have the scheduled next awake time in whatever
instance variable you returned in
`{Stateful,Stateless}SourcePartition.next_awake` or
`UnaryLogic.notify_at`.

0.18.2

- Fixes a bug that prevented the deletion of old state in recovery stores.

- Better error messages on invalid epoch and backup interval
parameters.

- Fixes bug where dataflow will hang if a source's `next_awake` is set
far in the future.

0.18.1

- Changes the default batch size for `KafkaSource` from 1 to 1000 to match
the Kafka input operator.

- Fixes an issue with the `count_window` operator:
https://github.com/bytewax/bytewax/issues/364.

0.18.0

- Support for schema registries, through
`bytewax.connectors.kafka.registry.RedpandaSchemaRegistry` and
`bytewax.connectors.kafka.registry.ConfluentSchemaRegistry`.

- Custom Kafka operators in `bytewax.connectors.kafka.operators`:
`input`, `output`, `deserialize_key`, `deserialize_value`,
`deserialize`, `serialize_key`, `serialize_value` and `serialize`.

- *Breaking change* `KafkaSource` now emits a special
`KafkaSourceMessage` to allow access to all data on consumed
messages. `KafkaSink` now consumes `KafkaSinkMessage` to allow
setting additional fields on produced messages.

- Non-linear dataflows are now possible. Each operator method returns
a handle to the `Stream`s it produces; add further steps via calling
operator functions on those returned handles, not the root
`Dataflow`. See the migration guide for more info.

- Auto-complete and type hinting on operators, inputs, outputs,
streams, and logic functions now works.

- A ton of new operators: `collect_final`, `count_final`,
`count_window`, `flatten`, `inspect_debug`, `join`, `join_named`,
`max_final`, `max_window`, `merge`, `min_final`, `min_window`,
`key_on`, `key_assert`, `key_split`, `merge`, `unary`. Documentation
for all operators are in `bytewax.operators` now.

- New operators can be added in Python, made by grouping existing
operators. See `bytewax.dataflow` module docstring for more info.

- *Breaking change* Operators are now stand-alone functions; `import
bytewax.operators as op` and use e.g. `op.map("step_id", upstream,
lambda x: x + 1)`.

- *Breaking change* All operators must take a `step_id` argument now.

- *Breaking change* `fold` and `reduce` operators have been renamed to
`fold_final` and `reduce_final`. They now only emit on EOF and are
only for use in batch contexts.

- *Breaking change* `batch` operator renamed to `collect`, so as to
not be confused with runtime batching. Behavior is unchanged.

- *Breaking change* `output` operator does not forward downstream its
items. Add operators on the upstream handle instead.

- `next_batch` on input partitions can now return any `Iterable`, not
just a `List`.

- `inspect` operator now has a default inspector that prints out items
with the step ID.

- `collect_window` operator now can collect into `set`s and `dict`s.

- Adds a `get_fs_id` argument to `{Dir,File}Source` to allow handling
non-identical files per worker.

- Adds a `TestingSource.EOF` and `TestingSource.ABORT` sentinel values
you can use to test recovery.

- *Breaking change* Adds a `datetime` argument to
`FixedPartitionSource.build_part`, `DynamicSource.build_part`,
`StatefulSourcePartition.next_batch`, and
`StatelessSourcePartition.next_batch`. You can now use this to
update your `next_awake` time easily.

- *Breaking change* Window operators now emit `WindowMetadata` objects
downstream. These objects can be used to introspect the open_time
and close_time of windows. This changes the output type of windowing
operators from: `(key, values)` to `(key, (metadata, values))`.

- *Breaking change* IO classes and connectors have been renamed to
better reflect their semantics and match up with documentation.

- Moves the ability to start multiple Python processes with the
`-p` or `--processes` to the `bytewax.testing` module.

- *Breaking change* `SimplePollingSource` moved from
`bytewax.connectors.periodic` to `bytewax.inputs` since it is an
input helper.

- `SimplePollingSource`'s `align_to` argument now works.

0.17.1

- Adds the `batch` operator to Dataflows. Calling `Dataflow.batch`
will batch incoming items until either a batch size has been reached
or a timeout has passed.

- Adds the `SimplePollingInput` source. Subclass this input source to
periodically source new input for a dataflow.

- Re-adds GLIBC 2.27 builds to support older linux distributions.

Page 1 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.