Intel-extension-for-openxla

Latest version: v0.3.0

Safety actively analyzes 623465 Python packages for vulnerabilities to keep your Python projects secure.

0.3.0

Major Features

Intel® Extension for OpenXLA* is Intel optimized PyPI package to extend official [OpenXLA](https://github.com/openxla/xla) on Intel GPU. It is based on [PJRT](https://opensource.googleblog.com/2023/05/pjrt-simplifying-ml-hardware-and-framework-integration.html) plugin mechanism, which can seamlessly run [JAX](https://jax.readthedocs.io/en/latest/index.html) models on [Intel® Data Center GPU Max Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/max-series.html) and [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/details/discrete-gpus/data-center-gpu/flex-series.html). This release contains following major features:

- **JAX Upgrade:** Upgrade version to **v0.4.24**.
- **Feature Support:**
- Supports custom call registration mechanism by new OpenXLA C API. This feature provides the ability to interact with third-party software, such as [mpi4jax](https://github.com/mpi4jax/mpi4jax).
- Continue to improve JAX native distributed scale-up collectives. Now it supports any number of devices **less than 16** in a single node.
- Experimental support for Intel® Data Center GPU Flex Series.
- **Bug Fix:**
- Fix accuracy issues in GEMM kernel when it's optimized by Intel® Xe Templates for Linear Algebra (XeTLA).
- Fix crash when input batch size is greater than **65535**.
- **Toolkit Support:** Support [Intel® oneAPI Base Toolkit 2024.1](https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-toolkit-release-notes.html).

Known Caveats

- Extension will crash when using Binary operations (e.g. `Mul`, `MatMul`) and SPMD multi-device parallelism API [`psum_scatter`](https://jax.readthedocs.io/en/latest/notebooks/shard_map.html#psum-scatter) under same `partial` annotation. Please refer JAX UT [test_matmul_reduce_scatter](https://github.com/google/jax/blob/jaxlib-v0.4.24/tests/shard_map_test.py#L153-L159) to understand the error scenario better.
- JAX collectives fall into deadlock and hang Extension when working with **Toolkit 2024.1**. Recommend to use **Toolkit 2024.0** if need collectives.
- `clear_backends` API doesn't work and may cause an OOM exception as below when working with **Toolkit 2024.0**.

terminate called after throwing an instance of 'sycl::_V1::runtime_error'
what(): Native API failed. Native API returns: -5 (PI_ERROR_OUT_OF_RESOURCES) -5 (PI_ERROR_OUT_OF_RESOURCES)
Fatal Python error: Aborted

**Note**: `clear_backends` API will be deprecated by JAX soon.

Breaking changes

- Previous JAX **v0.4.20** is no longer supported. Please follow [JAX change log](https://jax.readthedocs.io/en/latest/changelog.html) to update application if meets version errors.

Documents

- [Introduce of Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla/blob/r0.2.0/README.md#intel-extension-for-openxla)
- [Accelerate JAX models on Intel GPUs via PJRT](https://opensource.googleblog.com/2023/06/accelerate-jax-models-on-intel-gpus-via-pjrt.html)
- [Accelerate Stable Diffusion on Intel GPUs with Intel® Extension for OpenXLA*](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-stable-diffusion-on-intel-gpus-openxla.html)

0.2.1

Bug Fixes and Other Changes
* Fix **known caveat** related to `XLA_ENABLE_MULTIPLE_STREAM=1`. The accuracy issue is fixed and no need to set this environment variable anymore.
* Fix **known caveat** related to `MHA=0`. The crash error is fixed and no need to set this environment variable anymore.
* Fix compatibility issue with upgraded Driver [LTS release 2350.29](https://dgpu-docs.intel.com/releases/LTS_803.29_20240131.html)
* Fix random accuracy issue caused by `AllToAll` collective.
* Upgrade [transformers](https://github.com/huggingface/transformers) used by [examples](https://github.com/intel-innersource/frameworks.ai.intel-extension-for-openxla.intel-extension-for-openxla/tree/r0.2.1/example) to 4.36 to fix open CVE.

Known Caveats
* Device number is restricted as **2/4/6/8/10/12** in the experimental supported collectives in single node.
* Do not use collectives (e.g. `AllReduce`) in nested `pjit`, it may cause random accuracy issue. Please refer JAX UT [`testAutodiff`](https://github.com/google/jax/blob/jaxlib-v0.4.20/tests/pjit_test.py#L646-L661) to understand the error scenario better.

---
**Full Changelog**: https://github.com/intel/intel-extension-for-openxla/compare/0.2.0...0.2.1

0.2.0

Major Features

Intel® Extension for OpenXLA* is Intel optimized PyPI package to extend official [OpenXLA](https://github.com/openxla/xla) on Intel GPU. It is based on [PJRT](https://opensource.googleblog.com/2023/05/pjrt-simplifying-ml-hardware-and-framework-integration.html) plugin mechanism, which can seamlessly run [JAX](https://jax.readthedocs.io/en/latest/index.html) models on Intel® Data Center GPU Max Series. This release contains following major features:

- Upgrade JAX version to **v0.4.20**.

- Experimental support JAX native distributed scale-up collectives based on [JAX pmap](https://jax.readthedocs.io/en/latest/_autosummary/jax.pmap.html).

- Continuous optimize common kernels, and optimize GEMM kernels by [Intel® Xe Templates for Linear Algebra](https://github.com/intel/xetla). 3 inference models (Stable Diffusion, GPT-J, FLAN-T5) are verified on Intel® Data Center GPU Max Series single device, and added to [examples](https://github.com/intel/intel-extension-for-openxla/tree/r0.2.0/example).

Known Caveats

- Device number is restricted as **2/4/6/8/10/12** in the experimental supported collectives in single node.

- `XLA_ENABLE_MULTIPLE_STREAM=1` should be set when use [JAX parallelization](https://jax.readthedocs.io/en/latest/notebooks/Distributed_arrays_and_automatic_parallelization.html#) on multiply devices without collectives. It will add synchronization between different devices to avoid possible accuracy issue.

- `MHA=0` should be set to disable MHA fusion in training. MHA fusion is not supported in training yet and will cause runtime error as below:

ir_emission_[utils.cc:109](http://utils.cc:109/)] Check failed: lhs_shape.dimensions(dim_numbers.lhs_contracting_dimensions(0)) == rhs_shape.dimensions(dim_numbers.rhs_contracting_dimensions(0))

Breaking changes

- Previous JAX **v0.4.13** is no longer supported. Please follow [JAX change log](https://jax.readthedocs.io/en/latest/changelog.html) to update application if meet version errors.

- GCC **10.0.0** or newer is required if build from source. Please refer [installation guide](https://github.com/intel/intel-extension-for-openxla/blob/r0.2.0/README.md#3-install) for more details.

Documents

- [Introduce of Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla/blob/r0.2.0/README.md#intel-extension-for-openxla)
- [Accelerate JAX models on Intel GPUs via PJRT](https://opensource.googleblog.com/2023/06/accelerate-jax-models-on-intel-gpus-via-pjrt.html)
- [Accelerate Stable Diffusion on Intel GPUs with Intel® Extension for OpenXLA*](https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-stable-diffusion-on-intel-gpus-openxla.html)

0.1.0

Major Features

Intel® Extension for OpenXLA* is Intel optimized Python package to extend official [OpenXLA](https://github.com/openxla/xla) on Intel GPU. It is based on [PJRT](https://opensource.googleblog.com/2023/05/pjrt-simplifying-ml-hardware-and-framework-integration.html) plugin mechanism, which can seamlessly run [JAX](https://jax.readthedocs.io/en/latest/index.html) models on Intel® Data Center GPU Max Series. The PJRT API simplified the integration, which allowed the Intel XPU plugin to be developed separately and quickly integrated into JAX. This release contains following major features:

- **Kernel enabling and optimization**

Common kernels are enabled with LLVM/SPIRV software stack. Convolution and GEMM are enabled with OneDNN. And [Stable Diffusion]( https://www.intel.com/content/www/us/en/developer/articles/technical/accelerate-stable-diffusion-on-intel-gpus-openxla.html) is verified.

Known Issues

* Limited support for collective ops due to the limitation of oneCCL.

Related Blog
* [Accelerate JAX models on Intel GPUs via PJRT](https://opensource.googleblog.com/2023/06/accelerate-jax-models-on-intel-gpus-via-pjrt.html)

Releases

Has known vulnerabilities