Highlights
* Spark 3.2.2 is used as default version for Spark runner ([23804](https://github.com/apache/beam/issues/23804)).
* The Go SDK has a new default local runner, called Prism ([24789](https://github.com/apache/beam/issues/24789)).
* All Beam released container images are now [multi-arch images](https://cloud.google.com/kubernetes-engine/docs/how-to/build-multi-arch-for-arm#what_is_a_multi-arch_image) that support both x86 and ARM CPU architectures.
I/Os
* Java KafkaIO now supports picking up topics via topicPattern ([26948](https://github.com/apache/beam/pull/26948))
* Support for read from Cosmos DB Core SQL API ([23604](https://github.com/apache/beam/issues/23604))
* Upgraded to HBase 2.5.5 for HBaseIO. (Java) ([27711](https://github.com/apache/beam/issues/19554))
* Added support for GoogleAdsIO source (Java) ([27681](https://github.com/apache/beam/pull/27681)).
New Features / Improvements
* The Go SDK now requires Go 1.20 to build. ([27558](https://github.com/apache/beam/issues/27558))
* The Go SDK has a new default local runner, Prism. ([24789](https://github.com/apache/beam/issues/24789)).
* Prism is a portable runner that executes each transform independantly, ensuring coders.
* At this point it supercedes the Go direct runner in functionality. The Go direct runner is now deprecated.
* See https://github.com/apache/beam/blob/master/sdks/go/pkg/beam/runners/prism/README.md for the goals and features of Prism.
* Hugging Face Model Handler for RunInference added to Python SDK. ([26632](https://github.com/apache/beam/pull/26632))
* Hugging Face Pipelines support for RunInference added to Python SDK. ([27399](https://github.com/apache/beam/pull/27399))
* Vertex AI Model Handler for RunInference now supports private endpoints ([27696](https://github.com/apache/beam/pull/27696))
* MLTransform transform added with support for common ML pre/postprocessing operations ([26795](https://github.com/apache/beam/pull/26795))
* Upgraded the Kryo extension for the Java SDK to Kryo 5.5.0. This brings in bug fixes, performance improvements, and serialization of Java 14 records. ([27635](https://github.com/apache/beam/issues/27635))
* All Beam released container images are now [multi-arch images](https://cloud.google.com/kubernetes-engine/docs/how-to/build-multi-arch-for-arm#what_is_a_multi-arch_image) that support both x86 and ARM CPU architectures. ([27674](https://github.com/apache/beam/issues/27674)). The multi-arch container images include:
* All versions of Go, Python, Java and Typescript SDK containers.
* All versions of Flink job server containers.
* Java and Python expansion service containers.
* Transform service controller container.
* Spark3 job server container.
* Added support for batched writes to AWS SQS for improved throughput (Java, AWS 2).([21429](https://github.com/apache/beam/issues/21429))
Breaking Changes
* Python SDK: Legacy runner support removed from Dataflow, all pipelines must use runner v2.
* Python SDK: Dataflow Runner will no longer stage Beam SDK from PyPI in the `--staging_location` at pipeline submission. Custom container images that are not based on Beam's default image must include Apache Beam installation.([26996](https://github.com/apache/beam/issues/26996))
Deprecations
* The Go Direct Runner is now Deprecated. It remains available to reduce migration churn.
* Tests can be set back to the direct runner by overriding TestMain: `func TestMain(m *testing.M) { ptest.MainWithDefault(m, "direct") }`
* It's recommended to fix issues seen in tests using Prism, as they can also happen on any portable runner.
* Use the generic register package for your pipeline DoFns to ensure pipelines function on portable runners, like prism.
* Do not rely on closures or using package globals for DoFn configuration. They don't function on portable runners.
Bugfixes
* Fixed DirectRunner bug in Python SDK where GroupByKey gets empty PCollection and fails when pipeline option `direct_num_workers!=1`.([27373](https://github.com/apache/beam/pull/27373))
* Fixed BigQuery I/O bug when estimating size on queries that utilize row-level security ([27474](https://github.com/apache/beam/pull/27474))
Known Issues
* Long-running Python pipelines might experience a memory leak: [28246](https://github.com/apache/beam/issues/28246).
* Python Pipelines using BigQuery IO or `orjson` dependency might experience segmentation faults or get stuck: [28318](https://github.com/apache/beam/issues/28318).
* Beam Python containers rely on a version of Debian/aom that has several security vulnerabilities: [CVE-2021-30474](https://nvd.nist.gov/vuln/detail/CVE-2021-30474), [CVE-2021-30475](https://nvd.nist.gov/vuln/detail/CVE-2021-30475), [CVE-2021-30473](https://nvd.nist.gov/vuln/detail/CVE-2021-30473), [CVE-2020-36133](https://nvd.nist.gov/vuln/detail/CVE-2020-36133), [CVE-2020-36131](https://nvd.nist.gov/vuln/detail/CVE-2020-36131), [CVE-2020-36130](https://nvd.nist.gov/vuln/detail/CVE-2020-36130), and [CVE-2020-36135](https://nvd.nist.gov/vuln/detail/CVE-2020-36135)
* Python SDK's cross-language Bigtable sink mishandles records that don't have an explicit timestamp set: [28632](https://github.com/apache/beam/issues/28632). To avoid this issue, set explicit timestamps for all records before writing to Bigtable.
* Python SDK worker start-up logs, particularly PIP dependency installations, that are not logged at warning or higher are suppressed. This suppression is reverted in 2.51.0.
* MLTransform drops the identical elements in the output PCollection. For any duplicate elements, a single element will be emitted downstream. ([29600](https://github.com/apache/beam/issues/29600)).