Intel-extension-for-tensorflow

Latest version: v2.15.0.0

Safety actively analyzes 623439 Python packages for vulnerabilities to keep your Python projects secure.

2.15.0.0

Major Features and Improvements

Intel® Extension for TensorFlow* extends the official [TensorFlow](https://github.com/tensorflow/tensorflow) capabilities, allowing TensorFlow workloads to run on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release includes the following major features and improvements:

- **Updated Support:** The Intel® Extension for TensorFlow* has been upgraded to support [TensorFlow 2.15](https://github.com/tensorflow/tensorflow/tree/v2.15.1), the version released by Google and required for this release.
- **Toolkit Support**: Supports [Intel® oneAPI Base Toolkit 2024.1](https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-toolkit-release-notes.html).

- **NextPluggableDevice integration:** Integrates NextPluggableDevice (an advanced generation of the PluggableDevice mechanism) as a new device type to enable seamless integration of new accelerator plugin. For more details, see the [NextPluggableDevice Overview](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.15.0.0/docs/guide/next_pluggable_device.md).

- **Experimental support:** Provides experimental support for Intel GPU backend for OpenXLA, enabling OpenXLA GPU backend in Intel® Extension for TensorFlow* via PJRT plugin. For more details, see the [OpenXLA](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.15.0.0/docs/guide/OpenXLA.md).

- **Compiler enablement:** Enables Clang compiler to build Intel® Extension for TensorFlow* CPU wheels starting with this release. The currently supported version is LLVM/clang 17. The official Wheels, published on PyPI, will be based on Clang; however, users can choose to build wheels using the GCC compiler by following the steps in the [Configure For CPU guide](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.15.0.0/docs/install/how_to_build.md#configure-for-cpu).

- **Performance optimization:** Enables weight pre-pack support for Intel® Extension for TensorFlow* CPU to provide better performance and reduce memory footprint of `_ITEXMatMul` and `_ITEXFusedMatMul`. For more details, see the [Weight Pre-Pack](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.15.0.0/docs/guide/weight_prepack.md).

- **Package redefinition:** Re-defines XPU package to support GPU backend only starting with this release. The official XPU wheels published on PyPI will support only the GPU backend, and the GPU wheels will be deprecated.

- **New Operations:** Supports new OPs to cover the majority of TensorFlow 2.15 OPs.

- **Expreimental Support:** Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Known Issues
- **TensorList limitation:** TensorList is not supported with NextPluggableDevice by TensorFlow 2.15.
- **Allocation limitation of WSL:** A maximum size of single allocation allowed on a single device is set on the Windows Subsystem for Linux (WSL2), which may cause Out-of-Memory error. Users can remove the limitation with environment variable `UR_L0_ENABLE_RELAXED_ALLOCATION_LIMITS=1`
- **FP64 support:** FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as `'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.`
- **GLIBC++ mismatch:** A `GLIBC++` version mismatch may cause a workload exit with the exception, `Can not find any devices. To check runtime environment on your host, please run itex/tools/python/env_check.py.` Try running [env_check.py](https://github.com/intel/intel-extension-for-tensorflow/blob/r2.15/tools/python/env_check.py) script to confirm.

Other Information
- **Performance Data:** Provides a [Performance Data](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.15.0.0/docs/guide/performance.md) document to demonstrate the training and inference performance as well as accuracy results on several popular AI workloads with Intel® Extension for TensorFlow* benchmarked on Intel GPUs.

Documentations

- [Welcome to Intel® Extension for TensorFlow* documentation](https://intel.github.io/intel-extension-for-tensorflow/v2.15.0.0/get_started.html)
- [TensorFlow Serving Installation Guide](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.15.0.0/docs/guide/tf_serving_install.md)
- Distributed supported by [Intel® Optimization for Horovod*](https://github.com/intel/intel-optimization-for-horovod/blob/v0.28.1.4/xpu_docs/tensorflow_example.md)
- [Intel® Extension for TensorFlow* Installation guide](https://intel.github.io/intel-extension-for-tensorflow/v2.15.0.0/get_started.html#install)
- [Frequently Asked Questions](https://intel.github.io/intel-extension-for-tensorflow/v2.15.0.0/docs/guide/FAQ.html)

2.14.0.1

Major Features and Improvements

Intel® Extension for TensorFlow* extends official [TensorFlow](https://github.com/tensorflow/tensorflow) capabilities to run TensorFlow workloads on Intel® Data Center GPU Max Series, Intel® Data Center GPU Flex Series, and Intel® Xeon® Scalable Processors. This release contains the following major features and improvement:

- The Intel® Extension for TensorFlow* supported TensorFlow version is successfully upgraded to Google released [TensorFlow 2.14](https://github.com/tensorflow/tensorflow/tree/v2.14.1), which is the required TensorFlow version for this release.
- Supports [Intel® oneAPI Base Toolkit 2024.0](https://www.intel.com/content/www/us/en/developer/articles/release-notes/intel-oneapi-toolkit-release-notes.html).
- Provides experimental support for selecting CPU thread pools using either OpenMP thread pool (default) or Eigen thread pool. You can select the more efficient thread pool based on the workload and hardware configuration. Refer to [Selecting Thread Pool in Intel® Extension for TensorFlow* CPU](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.14.0.1/docs/guide/threadpool.md) for more details.

- Enables FP8 functionality support for Transformer-like training models. Refer to [FP8 BERT-Large Fine-tuning for Classifying Text on Intel GPU](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.14.0.1/examples/train_bert_fp8/README.md) for more details.
- Provides experimental support for quantization front-end python API, based on [Intel® Neural Compressor](https://github.com/intel/neural-compressor).

- Adds OPs performance optimizations:
- Optimizes `GroupNorm`/`Unique` operators.
- Optimizes `Einsum`/`ScaledDotProductAttention` with XeTLA enabled.

- Supports new OPs to cover the majority of TensorFlow 2.14 OPs.
- Continues to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.
- Moves the experimental support for Intel GPU backend for OpenXLA from the Intel® Extension for TensorFlow repository to the [Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla) repository. Refer to [Intel® Extension for OpenXLA*](https://github.com/intel/intel-extension-for-openxla) for more details.

Known Issues

- FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload with the FP64 kernel on that platform, the workload will exit with an exception as `'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.`
- A `GLIBC++` version mismatch may cause a workload exit with the exception, `Can not find any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.` Try running [env_check.sh](https://github.com/intel/intel-extension-for-tensorflow/blob/r2.14/tools/env_check.sh) script to confirm.

Documents

- [Welcome to Intel® Extension for TensorFlow* documentation](https://intel.github.io/intel-extension-for-tensorflow/v2.14.0.1/get_started.html)
- [TensorFlow Serving Installation Guide](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.14.0.1/docs/guide/tf_serving_install.md)
- Distributed supported by [Intel® Optimization for Horovod*](https://github.com/intel/intel-optimization-for-horovod/blob/v0.28.1.2/xpu_docs/tensorflow_example.md)
- [Intel® Extension for TensorFlow* Installation guide](https://intel.github.io/intel-extension-for-tensorflow/v2.14.0.1/get_started.html#install)
- [Frequently Asked Questions](https://intel.github.io/intel-extension-for-tensorflow/v2.14.0.1/docs/guide/FAQ.html)

2.13.0.0

Major Features and Improvements

Intel® Extension for TensorFlow* extended official [TensorFlow](https://github.com/tensorflow/tensorflow) capability to run TensorFlow workloads on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series, Intel® Xeon® Scalable Processors. This release contains following major features and improvement:
- Intel® Extension for TensorFlow* supported TensorFlow version was successfully upgraded to Google latest released [TensorFlow2.13](https://github.com/tensorflow/tensorflow/tree/v2.13.0), which is the unique supported TensorFlow version in this release.
- Refined Intel® Extension for TensorFlow* version to four digits version format v2.13.0.0 based on the three digits from stock TensorFlow v2.13.0 with the last digit incrementing per extension release. This will make it easier for users to understand Intel® Extension for TensorFlow* and stock TensorFlow version mapping relationship.
- Unified one XPU package to support both CPU and GPU backend and provided flexibility for users on different CPU or GPU hardware platforms.
- Supported TensorFlow Serving running above Intel® Extension for TensorFlow* to provide serving service in a production environment. Learn more in the [TensorFlow Serving Installation Guide](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.13.0.0/docs/guide/tf_serving_install.md).
- Enabled INT8 quantization by [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) as default solution on GPU in Intel® Extension for TensorFlow* to provide better INT8 user experience together with [Intel® Neural Compressor](https://github.com/intel/neural-compressor/tags) >= 2.2.
- Add OPs performance optimization
- Enabled SYCL native BFloat16 data type support.
- SpaceToBatchND/BatchToSpaceND 1.1x ~ 1.8x improvement compare with last release.
- SelectOP 1.3x ~ 1.7x improvement compare with last release.
- LstmEltwiseKernel 1.28x ~ 1.7x improvement compare with last release.
- BucketizeOp 4x improvement compare with last release.
- Supported new Ops to cover majority of TensorFlow 2.13.0 Ops.
- Dynamic loading Intel® Advanced Vector Extensions AVX2 and AVX512 Instructions by adapting to user's hardware to maximize CPU performance.
- Supported FP16 data type with AMX simulation on 4th Gen Intel® Xeon® Scalable processors (code name Sapphire Rapids).
- This release started to provide product support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids).
- This release continued to provide experimental support for Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Known Issues

- FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as `'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.`
- `GLIBC++` version mismatch may cause workload exit with exception `Can not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.` Please try [env_check.sh](https://github.com/intel/intel-extension-for-tensorflow/blob/r2.13/tools/env_check.sh) to confirm.

Documents

- [Welcome to Intel® Extension for TensorFlow* documentation](https://intel.github.io/intel-extension-for-tensorflow/v2.13.0.0/get_started.html)
- Provided guide docs to users for [TensorFlow Serving Installation Guide](https://github.com/intel/intel-extension-for-tensorflow/blob/v2.13.0.0/docs/guide/tf_serving_install.md)
- Distributed supported by [Intel® Optimization for Horovod*](https://github.com/intel/intel-optimization-for-horovod/blob/v0.28.1.0/xpu_docs/tensorflow_example.md)
- [Intel® Extension for TensorFlow* Installation guide](https://intel.github.io/intel-extension-for-tensorflow/v2.13.0.0/get_started.html#install)
- [Frequently Asked Questions](https://intel.github.io/intel-extension-for-tensorflow/v2.13.0.0/docs/guide/FAQ.html)

1.2.0

Major Features and Improvements

Intel® Extension for TensorFlow* extended official [TensorFlow](https://github.com/tensorflow/tensorflow) capability to run TensorFlow workload on Intel® Data Center Max GPU and Intel® Data Center GPU Flex Series. This release contains following major features and improvements:

- The TensorFlow version supported by Intel® Extension for TensorFlow* v1.2.0 was successfully upgraded to Google latest released [TensorFlow 2.12](https://github.com/tensorflow/tensorflow/tree/v2.12.0). Due to TensorFlow 2.12 [break change in protobuf](https://github.com/tensorflow/tensorflow/commit/84f40925e929d05e72ab9234e53c729224e3af38), Intel® Extension for TensorFlow* can only seamlessly binary co-work with TensorFlow 2.12 in this release.
- Adopted a uniform Device API PJRT as the supported device plugin mechanism to implement Intel GPU backend for OpenXLA experimental support. Users can build Intel® Extension for TensorFlow* source and run JAX front end APIs with OpenXLA. Refer to [OpenXLA Support GPU](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs/guide/OpenXLA_Support_on_GPU.md) for more details.
- Updated oneDNN version to [v3.1](https://github.com/oneapi-src/oneDNN/releases/tag/v3.1) which includes multiple functional and performance improvements for CPU and GPU implementations.
- Supported generative AI model [Stable diffusion]([https://github.com/keras-team/keras-cv.git) and optimized model to get better performance. Get started in [Stable Diffusion Inference for Text2Image on Intel GPU](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/examples/stable_diffussion_inference).
- Supported XPUAutoShard in Intel® Extension for TensorFlow* as an experimental feature. Given a set of homogeneous XPU devices (eg. 2 GPU tiles), XPUAutoShard automatically shards input data and TensorFlow graph by placing these data/graph shard on different GPU devices to maximize hardware usage. Refer to [XPUAutoShard on GPU](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs/guide/XPUAutoShard.md) for more details.
- Provided Python API`itex.experimental_ops_override()` to automatically override some TensorFlow operators by [Customized Operators](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs/guide/itex_ops.md) under `itex.ops` namespace, as well as to be compatible with existing trained parameters. More in [usage details](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs/guide/itex_ops_override.md).

- Added operators performance optimization
- Optimized `ResizeNearestNeighborGrad`/`All`/`Any`/`Slice`/`SpaceToBatchND`/`BatchToSpaceND`/`BiasAddGrad` operators.
- Optimized math function(eg. `tanh`, `rsqrt`) with small shape (eg. size=8192) on Intel® Data Center GPU Flex Series by vectorization optimization.
- Optimized reduction series ops by improving threads and memory utility for Col/Row reduction separately.

- Supported AOT([Ahead-of-time compilation](https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html)) on Intel® Data Center Max GPU, Intel® Data Center GPU Flex Series and Intel® Arc™ A-Series GPUs in Intel® Extension for TensorFlow* package in PyPI channel. You can also specify hardware platform type when configure your system in source code build.
- This release continued to provide experimental support for second generation Intel® Xeon® Scalable Processors and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and Intel® Arc™ A-Series GPUs on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Bug Fixes and Other Changes

- Upgraded pybind11 version to support Python 3.11 source build.
- Initialized environment variables for Intel® oneAPI Base Toolkit in docker container by default.

Known Issues

- FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as `'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.`
- Tensorboard cannot co-work with stock TensorFlow 2.12 due to two issues of https://github.com/tensorflow/tensorflow/issues/60262 and https://github.com/tensorflow/profiler/issues/602.
- `GLIBC++` version mismatch may cause workload exit with exception `Can not found any devices. To check runtime environment on your host, please run itex/tools/env_check.sh.` Please try [env_check.sh](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.2/tools/env_check.sh) for assistance.

Documents

- [Welcome to Intel® Extension for TensorFlow* documentation](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs)

- Provided new guide documentation to developers for [How to write custom op](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs/design/how_to_write_custom_op.md).

- Distributed supported by [Intel® Optimization for Horovod*](https://github.com/intel/intel-optimization-for-horovod/tree/r0.5).

- [Intel® Extension for TensorFlow* Installation guide](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.2/docs/install/installation_guide.rst).

- [Frequently Asked Questions](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.2/docs/guide/FAQ.md).

1.1.0

Major Features and Improvements

Intel® Extension for TensorFlow* has already extended official [TensorFlow](https://github.com/tensorflow/tensorflow) capability to run TensorFlow workload on Intel® Data Center Max GPU Series and Intel® Data Center GPU Flex Series. This release contains following major features and improvement:
- Intel® Extension for TensorFlow* supported TensorFlow version was successfully upgraded to Google latest released TensorFlow 2.11. So in this release Intel® Extension for TensorFlow* can seamlessly binary co-work with TensorFlow 2.11 and TensorFlow 2.10.
- Added [Intel® Optimization for Horovod*](https://github.com/intel/intel-optimization-for-horovod/tree/v0.4) in Intel ® Extension for TensorFlow\* Intel® Data Center Max GPU Series docker container. Users only need to install GPU driver in host machine and launch docker container directly to run TensorFlow + Horovod distributed workloads. Please get start from [Docker Container Guide](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/docs/install/install_for_gpu.md#install-via-docker-container) and [Horovod ResNet50 example](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/examples/train_horovod/resnet50/README.md).
- Enhanced unit tests to cover majority of TensorFlow Ops.
- Added new OPs support and performance optimization
- Added double data type support for`MatMul`/`BatchMatMul`/`BatchMatMulV2`.
- Enabled Eigen vectorized RNE conversion between packed BF16 and FP32 for element-wise ops.
- Enabled vectorization pass for Sigmoid OP.
- Optimized ItexLSTM/NMS/ResizeNearestNeighbor OP.
- Added more fusion pattern support(Conv+BiasAdd+Relu+Add fusion, Conv + Mish fusion).
- Enabled INT8 quantization by [oneDNN Graph API](https://spec.oneapi.io/onednn-graph/latest/introduction.html) as default solution on CPU in Intel® Extension for TensorFlow* to provide better INT8 user experience together with [Intel® Neural Compressor](https://github.com/intel/neural-compressor) >= 2.0.
- Added [environment check script](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/tools/env_check.sh) for users to check software stack installation status, including OS version, GPU driver, TensorFlow and other dependencies version in Intel® oneAPI Base Toolkit.
- This release continued to provide experimental support for second generation [Intel® Xeon® Scalable Processors](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/docs/install/experimental/install_for_cpu.md) and newer (such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids) and [Intel® Arc™ A-Series GPUs](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/docs/install/experimental/install_for_arc_gpu.md) on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux.

Bug Fixes and Other Changes

- Fixed several kernel bugs, including NAN issue in LogSoftmax OP, Segment fault failure in Unique/ ParallelConcat OP.
- Added cast from INT64 to BF16.

Known Issues

- FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload with FP64 kernel on that platform, the workload will exit with exception as `'XXX' Op uses fp64 data type, while fp64 instructions are not supported on the platform.`

Documents

- [Intel® Extension for TensorFlow* Installation guide](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/docs/install/installation_guide.rst)
- [Intel® Extension for TensorFlow* Docker Container Guide](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/docker/README.md)
- [INT8 Quantization](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.1/docs/guide/INT8_quantization.md)

1.0.0

Major Features

Intel® Extension for TensorFlow\* is an Intel optimized Python package to extend official [TensorFlow](https://github.com/tensorflow/tensorflow) capability of running TensorFlow workloads on Intel GPU, and brings the first Intel GPU product Intel® Data Center GPU Flex Series 170 into TensorFlow open source community for AI workload acceleration. It’s based on TensorFlow PluggableDevice interface and provides fully support from TensorFlow 2.10.

This release contains following major features:

- **AOT ([Ahead-of-time compilation](https://software.intel.com/content/www/us/en/develop/documentation/oneapi-dpcpp-cpp-compiler-dev-guide-and-reference/top/compilation/ahead-of-time-compilation.html))**

AOT Compilation is a performance feature which targets to remove just-in-time(JIT) overhead during application launch. It can be enabled when configure your system to [source code build](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/install/how_to_build.md#sample-configuration-session). Intel® Extension for TensorFlow\* package in PyPI channel is built with AOT enabled.

- **Graph Optimization**

Advanced Automatic Mixed Precision

Advanced Automatic Mixed Precision implements low-precision data types (`float16` or `bfloat16` ) with further boosted performance and less memory consumption. Please get started from [how to enable]( https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/guide/advanced_auto_mixed_precision.md).

Graph fusion

Intel® Extension for TensorFlow\* provides graph optimization to fuse specified operators pattern to new single operator for better performance, such as `Conv2D+ReLU`, `Linear+ReLU`. Refer to the supported fusion list from [Graph fusion]( https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/guide/itex_fusion.md "Graph fusion").

- **Python API**

Public APIs to extend XPU operators are developed for better performance in the `itex.ops` namespace, including `AdamWithWeightDecayOptimizer`/`gelu`/`LayerNormalization`/`ItexLSTM`. Please find more details from [Intel® Extension for TensorFlow\* ops]( https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/guide/itex_ops.md).

- **Intel® Extension for TensorFlow\* Profiler**

Intel® Extension for TensorFlow\* provides support for TensorFlow\* [Profiler](https://www.tensorflow.org/guide/profiler) to trace TensorFlow\* models performance on Intel GPU. Please refer to [how to enable profiler]( https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/guide/how_to_enable_profiler.md) for more details.

- **Docker Container Support**

Intel® Extension for TensorFlow\* Docker container is delivered to include Intel® oneAPI Base Toolkit and all other software stack except Intel GPU Drivers. Users only needs to install GPU driver in host machine, before pull and launch docker container directly. Please get started from [Docker Container Guide]( https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docker/README.md).

- **FP32 Math Mode**

Float32 precision is to reduce TensorFloat-32 execution by `ITEX_FP32_MATH_MODE` setting. Users can enable this feature by setting `ITEX_FP32_MATH_MODE`(default `FP32`) to be equal with either value (GPU:`TF32`/`FP32`). More details in [ITEX_FP32_MATH_MODE]( https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/guide/environment_variables.md).

- **Intel® Extension for TensorFlow\* Verbose**

`ITEX_VERBOSE` is designed to help users get more Intel® Extension for TensorFlow\* log message by different log levels. More details in [ITEX_VERBOSE level introduction](https://github.com/intel/intel-extension-for-tensorflow/tree/r1.0/docs/guide/environment_variables.md#itexdebugoptions).

- **INT8 Quantization**

Intel® Extension for TensorFlow\* co-works with [Intel® Neural Compressor](https://github.com/intel/neural-compressor "Intel® Neural Compressor") >= 1.14.1 to provide compatible TensorFlow INT8 quantization solution support with same user experience.

- **Experimental Support**

This release provides experimental support for [Intel® Arc™ A-Series GPUs](https://github.com/intel/intel-extension-for-tensorflow/blob/r1.0/docs/install/experimental/install_for_arc_gpu.md "Intel® Arc™ A-Series GPUs") on Windows Subsystem for Linux 2 with Ubuntu Linux installed and native Ubuntu Linux, and second generation Intel® Xeon® Scalable Processors and newer, such as Cascade Lake, Cooper Lake, Ice Lake and Sapphire Rapids.

Known Issues

- FP64 is not natively supported by the [Intel® Data Center GPU Flex Series](https://www.intel.com/content/www/us/en/products/docs/discrete-gpus/data-center-gpu/flex-series/overview.html) platform. If you run any AI workload on that platform and receive error message as "[CRITICAL ERROR] Kernel 'XXX' removed due to usage of FP64 instructions unsupported by the targeted hardware" , it means that a kernel requires FP64 instructions is removed and not executed, hence the accuracy of whole workload is wrong.

Documentations to get started

- [Welcome to Intel® Extension for TensorFlow\* documentation]( https://intel.github.io/intel-extension-for-tensorflow/latest/get_started.html "Welcome to Intel® Extension for TensorFlow* documentation")
- [How to build Intel® Extension for TensorFlow\*]( https://intel.github.io/intel-extension-for-tensorflow/latest/docs/install/how_to_build.html)

Releases

Has known vulnerabilities