Apache-beam

Latest version: v2.56.0

Safety actively analyzes 638264 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 7

2.46.0

Not secure
Highlights

* Java SDK containers migrated to [Eclipse Temurin](https://hub.docker.com/_/eclipse-temurin)
as a base. This change migrates away from the deprecated [OpenJDK](https://hub.docker.com/_/openjdk)
container. Eclipse Temurin is currently based upon Ubuntu 22.04 while the OpenJDK
container was based upon Debian 11.
* RunInference PTransform will accept model paths as SideInputs in Python SDK. ([24042](https://github.com/apache/beam/issues/24042))
* RunInference supports ONNX runtime in Python SDK ([22972](https://github.com/apache/beam/issues/22972))
* Tensorflow Model Handler for RunInference in Python SDK ([25366](https://github.com/apache/beam/issues/25366))
* Java SDK modules migrated to use `:sdks:java:extensions:avro` ([24748](https://github.com/apache/beam/issues/24748))

I/Os

* Added in JmsIO a retry policy for failed publications (Java) ([24971](https://github.com/apache/beam/issues/24971)).
* Support for `LZMA` compression/decompression of text files added to the Python SDK ([25316](https://github.com/apache/beam/issues/25316))
* Added ReadFrom/WriteTo Csv/Json as top-level transforms to the Python SDK.

New Features / Improvements

* Add UDF metrics support for Samza portable mode.
* Option for SparkRunner to avoid the need of SDF output to fit in memory ([23852](https://github.com/apache/beam/issues/23852)).
This helps e.g. with ParquetIO reads. Turn the feature on by adding experiment `use_bounded_concurrent_output_for_sdf`.
* Add `WatchFilePattern` transform, which can be used as a side input to the RunInference PTransfrom to watch for model updates using a file pattern. ([24042](https://github.com/apache/beam/issues/24042))
* Add support for loading TorchScript models with `PytorchModelHandler`. The TorchScript model path can be
passed to PytorchModelHandler using `torch_script_model_path=<path_to_model>`. ([25321](https://github.com/apache/beam/pull/25321))
* The Go SDK now requires Go 1.19 to build. ([25545](https://github.com/apache/beam/pull/25545))
* The Go SDK now has an initial native Go implementation of a portable Beam Runner called Prism. ([24789](https://github.com/apache/beam/pull/24789))
* For more details and current state see https://github.com/apache/beam/tree/master/sdks/go/pkg/beam/runners/prism.

Breaking Changes

* The deprecated SparkRunner for Spark 2 (see [2.41.0](2410---2022-08-23)) was removed ([25263](https://github.com/apache/beam/pull/25263)).
* Python's BatchElements performs more aggressive batching in some cases,
capping at 10 second rather than 1 second batches by default and excluding
fixed cost in this computation to better handle cases where the fixed cost
is larger than a single second. To get the old behavior, one can pass
`target_batch_duration_secs_including_fixed_cost=1` to BatchElements.
* Dataflow runner enables sibling SDK protocol for Python pipelines using custom containers on Beam 2.46.0 and newer SDKs.
If your Python pipeline starts to stall after you switch to 2.46.0 and you use a custom container, please verify
that your custom container does not include artifacts from older Beam SDK releases. In particular, check in your `Dockerfile`
that the Beam container entrypoint and/or Beam base image version match the Beam SDK version used at job submission.

Deprecations

* Avro related classes are deprecated in module `beam-sdks-java-core` and will be eventually removed. Please, migrate to a new module `beam-sdks-java-extensions-avro` instead by importing the classes from `org.apache.beam.sdk.extensions.avro` package.
For the sake of migration simplicity, the relative package path and the whole class hierarchy of Avro related classes in new module is preserved the same as it was before.
For example, import `org.apache.beam.sdk.extensions.avro.coders.AvroCoder` class instead of`org.apache.beam.sdk.coders.AvroCoder`. ([24749](https://github.com/apache/beam/issues/24749)).

Bugfixes

2.45.0

Not secure
I/Os

* Support for X source added (Java/Python) ([X](https://github.com/apache/beam/issues/X)).
* MongoDB IO connector added (Go) ([24575](https://github.com/apache/beam/issues/24575)).

New Features / Improvements

* RunInference Wrapper with Sklearn Model Handler support added in Go SDK ([24497](https://github.com/apache/beam/issues/23382)).
* Adding override of allowed TLS algorithms (Java), now maintaining the disabled/legacy algorithms
present in 2.43.0 (up to 1.8.0_342, 11.0.16, 17.0.2 for respective Java versions). This is accompanied
by an explicit re-enabling of TLSv1 and TLSv1.1 for Java 8 and Java 11.
* Add UDF metrics support for Samza portable mode.

Breaking Changes

* Portable Java pipelines, Go pipelines, Python streaming pipelines, and portable Python batch
pipelines on Dataflow are required to use Runner V2. The `disable_runner_v2`,
`disable_runner_v2_until_2023`, `disable_prime_runner_v2` experiments will raise an error during
pipeline construction. You can no longer specify the Dataflow worker jar override. Note that
non-portable Java jobs and non-portable Python batch jobs are not impacted. ([24515](https://github.com/apache/beam/issues/24515)).
* Beam now requires `pyarrow>=3` and `pandas>=1.4.3` since older versions are not compatible with `numpy==1.24.0`.

Bugfixes

* Avoids Cassandra syntax error when user-defined query has no where clause in it (Java) ([24829](https://github.com/apache/beam/issues/24829)).
* Fixed JDBC connection failures (Java) during handshake due to deprecated TLSv1(.1) protocol for the JDK. ([24623](https://github.com/apache/beam/issues/24623))
* Fixed Python BigQuery Batch Load write may truncate valid data when deposition sets to WRITE_TRUNCATE and incoming data is large (Python) ([24623](https://github.com/apache/beam/issues/24535)).

2.44.0

Not secure
I/Os

* Support for Bigtable sink (Write and WriteBatch) added (Go) ([23324](https://github.com/apache/beam/issues/23324)).
* S3 implementation of the Beam filesystem (Go) ([23991](https://github.com/apache/beam/issues/23991)).
* Support for SingleStoreDB source and sink added (Java) ([22617](https://github.com/apache/beam/issues/22617)).
* Added support for DefaultAzureCredential authentication in Azure Filesystem (Python) ([24210](https://github.com/apache/beam/issues/24210)).

New Features / Improvements

* Beam now provides a portable "runner" that can render pipeline graphs with
graphviz. See `python -m apache_beam.runners.render --help` for more details.
* Local packages can now be used as dependencies in the requirements.txt file, rather
than requiring them to be passed separately via the `--extra_package` option
(Python) ([23684](https://github.com/apache/beam/pull/23684)).
* Pipeline Resource Hints now supported via `--resource_hints` flag (Go) ([23990](https://github.com/apache/beam/pull/23990)).
* Make Python SDK containers reusable on portable runners by installing dependencies to temporary venvs ([BEAM-12792](https://issues.apache.org/jira/browse/BEAM-12792), [#16658](https://github.com/apache/beam/pull/16658)).
* RunInference model handlers now support the specification of a custom inference function in Python ([22572](https://github.com/apache/beam/issues/22572)).
* Support for `map_windows` urn added to Go SDK ([24307](https://github.apache/beam/pull/24307)).

Breaking Changes

* `ParquetIO.withSplit` was removed since splittable reading has been the default behavior since 2.35.0. The effect of
this change is to drop support for non-splittable reading (Java)([23832](https://github.com/apache/beam/issues/23832)).
* `beam-sdks-java-extensions-google-cloud-platform-core` is no longer a
dependency of the Java SDK Harness. Some users of a portable runner (such as Dataflow Runner v2)
may have an undeclared dependency on this package (for example using GCS with
TextIO) and will now need to declare the dependency.
* `beam-sdks-java-core` is no longer a dependency of the Java SDK Harness. Users of a portable
runner (such as Dataflow Runner v2) will need to provide this package and its dependencies.
* Slices now use the Beam Iterable Coder. This enables cross language use, but breaks pipeline updates
if a Slice type is used as a PCollection element or State API element. (Go)[24339](https://github.com/apache/beam/issues/24339)
* If you activated a virtual environment in your custom container image, this environment might no longer be activated, since a new environment will be created (see the note about [BEAM-12792](https://issues.apache.org/jira/browse/BEAM-12792) above).
To work around, install dependencies into the default (global) python environment. When using poetry you may need to use `poetry config virtualenvs.create false` before installing deps, see an example in: [25085](https://github.com/apache/beam/issues/25085).
If you were negatively impacted by this change and cannot find a workaround, feel free to chime in on [16658](https://github.com/apache/beam/pull/16658).
To disable this behavior, you could upgrade to Beam 2.48.0 and set an environment variable
`ENV RUN_PYTHON_SDK_IN_DEFAULT_ENVIRONMENT=1` in your Dockerfile.

Deprecations

* X behavior is deprecated and will be removed in X versions ([X](https://github.com/apache/beam/issues/X)).

Bugfixes

* Fixed X (Java/Python) ([X](https://github.com/apache/beam/issues/X)).
* Fixed JmsIO acknowledgment issue (Java) ([20814](https://github.com/apache/beam/issues/20814))
* Fixed Beam SQL CalciteUtils (Java) and Cross-language JdbcIO (Python) did not support JDBC CHAR/VARCHAR, BINARY/VARBINARY logical types ([23747](https://github.com/apache/beam/issues/23747), [#23526](https://github.com/apache/beam/issues/23526)).
* Ensure iterated and emitted types are used with the generic register package are registered with the type and schema registries.(Go) ([23889](https://github.com/apache/beam/pull/23889))

2.43.0

Not secure
Highlights

* Python 3.10 support in Apache Beam ([21458](https://github.com/apache/beam/issues/21458)).
* An initial implementation of a runner that allows us to run Beam pipelines on Dask. Try it out and give us feedback! (Python) ([18962](https://github.com/apache/beam/issues/18962)).

I/Os

* Decreased TextSource CPU utilization by 2.3x (Java) ([23193](https://github.com/apache/beam/issues/23193)).
* Fixed bug when using SpannerIO with RuntimeValueProvider options (Java) ([22146](https://github.com/apache/beam/issues/22146)).
* Fixed issue for unicode rendering on WriteToBigQuery ([22312](https://github.com/apache/beam/issues/22312))
* Remove obsolete variants of BigQuery Read and Write, always using Beam-native variant
([23564](https://github.com/apache/beam/issues/23564) and [#23559](https://github.com/apache/beam/issues/23559)).
* Bumped google-cloud-spanner dependency version to 3.x for Python SDK ([21198](https://github.com/apache/beam/issues/21198)).

New Features / Improvements

* Dataframe wrapper added in Go SDK via Cross-Language (with automatic expansion service). (Go) ([23384](https://github.com/apache/beam/issues/23384)).
* Name all Java threads to aid in debugging ([23049](https://github.com/apache/beam/issues/23049)).
* An initial implementation of a runner that allows us to run Beam pipelines on Dask. (Python) ([18962](https://github.com/apache/beam/issues/18962)).
* Allow configuring GCP OAuth scopes via pipeline options. This unblocks usages of Beam IOs that require additional scopes.
For example, this feature makes it possible to access Google Drive backed tables in BigQuery ([23290](https://github.com/apache/beam/issues/23290)).
* An example for using Python RunInference from Java ([23290](https://github.com/apache/beam/pull/23619)).
* Data can now be read from BigQuery and directly plumbed into a DeferredDataframe in the Dataframe API. Users no longer have to re-specify the schema in this case ([22907](https://github.com/apache/beam/pull/22907)).

Breaking Changes

* CoGroupByKey transform in Python SDK has changed the output typehint. The typehint component representing grouped values changed from List to Iterable,
which more accurately reflects the nature of the arbitrarily large output collection. [21556](https://github.com/apache/beam/issues/21556) Beam users may see an error on transforms downstream from CoGroupByKey. Users must change methods expecting a List to expect an Iterable going forward. See [document](https://docs.google.com/document/d/1RIzm8-g-0CyVsPb6yasjwokJQFoKHG4NjRUcKHKINu0) for information and fixes.
* The PortableRunner for Spark assumes Spark 3 as default Spark major version unless configured otherwise using `--spark_version`.
Spark 2 support is deprecated and will be removed soon ([23728](https://github.com/apache/beam/issues/23728)).

Bugfixes

* Fixed Python cross-language JDBC IO Connector cannot read or write rows containing Numeric/Decimal type values ([19817](https://github.com/apache/beam/issues/19817)).

2.42.0

Not secure
Highlights

* Added support for stateful DoFns to the Go SDK.
* Added support for [Batched
DoFns](https://beam.apache.org/documentation/programming-guide/#batched-dofns)
to the Python SDK.

New Features / Improvements

* Added support for Zstd compression to the Python SDK.
* Added support for Google Cloud Profiler to the Go SDK.
* Added support for stateful DoFns to the Go SDK.

Breaking Changes

* The Go SDK's Row Coder now uses a different single-precision float encoding for float32 types to match Java's behavior ([22629](https://github.com/apache/beam/issues/22629)).

Bugfixes

* Fixed Python cross-language JDBC IO Connector cannot read or write rows containing Timestamp type values [19817](https://github.com/apache/beam/issues/19817).
* Fixed `AfterProcessingTime` behavior in Python's `DirectRunner` to match Java ([23071](https://github.com/apache/beam/issues/23071))

Known Issues

* Go SDK doesn't yet support Slowly Changing Side Input pattern ([23106](https://github.com/apache/beam/issues/23106))

2.41.0

Not secure
I/Os

* Projection Pushdown optimizer is now on by default for streaming, matching the behavior of batch pipelines since 2.38.0. If you encounter a bug with the optimizer, please file an issue and disable the optimizer using pipeline option `--experiments=disable_projection_pushdown`.

New Features / Improvements

* Previously available in Java sdk, Python sdk now also supports logging level overrides per module. ([18222](https://github.com/apache/beam/issues/18222)).
* Added support for accessing GCP PubSub Message ordering keys (Java) ([BEAM-13592](https://issues.apache.org/jira/browse/BEAM-13592))

Breaking Changes

* Projection Pushdown optimizer may break Dataflow upgrade compatibility for optimized pipelines when it removes unused fields. If you need to upgrade and encounter a compatibility issue, disable the optimizer using pipeline option `--experiments=disable_projection_pushdown`.

Deprecations

* Support for Spark 2.4.x is deprecated and will be dropped with the release of Beam 2.44.0 or soon after (Spark runner) ([22094](https://github.com/apache/beam/issues/22094)).
* The modules [amazon-web-services](https://github.com/apache/beam/tree/master/sdks/java/io/amazon-web-services) and
[kinesis](https://github.com/apache/beam/tree/master/sdks/java/io/kinesis) for AWS Java SDK v1 are deprecated
in favor of [amazon-web-services2](https://github.com/apache/beam/tree/master/sdks/java/io/amazon-web-services2)
and will be eventually removed after a few Beam releases (Java) ([21249](https://github.com/apache/beam/issues/21249)).

Bugfixes

* Fixed a condition where retrying queries would yield an incorrect cursor in the Java SDK Firestore Connector ([22089](https://github.com/apache/beam/issues/22089)).
* Fixed plumbing allowed lateness in Go SDK. It was ignoring the user set value earlier and always used to set to 0. ([22474](https://github.com/apache/beam/issues/22474)).

Page 3 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.