Intel-optimization-for-horovod

Latest version: v0.28.1.4

Safety actively analyzes 623395 Python packages for vulnerabilities to keep your Python projects secure.

Page 1 of 5

0.28.1.4

Added

- Supported async wait for ccl::event.
- Supported TensorFlow NextPluggableDevice for Intel GPU device.
- Enabled TensorFlow AllReduceXLAOp for Intel GPU device.

Changed

- Skipped Pytorch AllReduce bf16 grad UT when rank > 2 for accuracy issue.

0.28.1.3

Changed

- Updated driver version to LTS-803.

0.28.1.2

Added

- Supported empty input buffer for AlltoAll primitive.
- Enabled TorusAllreduce for Intel GPU device.

Changed

- Updated Pytorch ResNet50 example with Intel GPU support.
- Supported ReduceScatter in Pytorch UTs.

Deprecated

Removed

Fixed

- Set special range for half and bf16 tensor for multi cards UTs.
- Fixed GCC13 CPU build issue.
- Fixed oneccl link path in cmakelist.
- Replaced '/gpu' to '/xpu' in tensorflow UTs.
- Fixed GPU check condition bug in pytorch sync_batch_norm.

0.28.1

Fixed

- Fixed build with gcc 12. ([3925](https://github.com/horovod/horovod/pull/3925))
- PyTorch: Fixed build on ROCm. ([3928](https://github.com/horovod/horovod/pull/3928))
- TensorFlow: Fixed local_rank_op. ([3940](https://github.com/horovod/horovod/pull/3940))

0.28.1.0

Added

- Added support for torch conv3d with channels_last_3d format.

Changed

- Refined batch memory copy kernel and supported padding to align w/ public logic, and updated corresponding cases.
- Rebased code to public v0.28.1 release.
- Aligned installation method w/ public HVD.
- Refined BroadcastInplaceOp for TF.
- Enabled public horovod examples of tensorflow for IOH.
- Skipped accuracy check for bf16/fp16 on ranks > 2 temporarily because not sure how to change threshold when rank increase.

Fixed

- Fixed SDL warning.
- Fixed hvd.join with allreduce.
- Fixed scale factor related accuracy issue for bf16/fp16.
- Fixed cpu_operation from CCL to MPI when enable INTEL GPU.

0.28.0

Added

- TensorFlow: Added new `get_local_and_global_gradients` to PartialDistributedGradientTape to retrieve local and non-local gradients separately. ([3859](https://github.com/horovod/horovod/pull/3859))

Changed

- Improved reducescatter performance by allocating output tensors before enqueuing the operation. ([3824](https://github.com/horovod/horovod/pull/3824))
- TensorFlow: Ensured that `tf.logical_and` within allreduce `tf.cond` runs on CPU. ([3885](https://github.com/horovod/horovod/pull/3885))
- TensorFlow: Added support for Keras 2.11+ optimizers. ([3860](https://github.com/horovod/horovod/pull/3860))
- `CUDA_VISIBLE_DEVICES` environment variable is no longer passed to remote nodes. ([3865](https://github.com/horovod/horovod/pull/3865))

Fixed

- Fixed build with ROCm. ([3839](https://github.com/horovod/horovod/pull/3839), [#3848](https://github.com/horovod/horovod/pull/3848))
- Fixed build of Docker image horovod-nvtabular. ([3851](https://github.com/horovod/horovod/pull/3851))
- Fixed linking recent NCCL by defaulting CUDA runtime library linkage to static and ensuring that weak symbols are overridden. ([3867](https://github.com/horovod/horovod/pull/3867), [#3846](https://github.com/horovod/horovod/pull/3846))
- Fixed compatibility with TensorFlow 2.12 and recent nightly versions. ([3864](https://github.com/horovod/horovod/pull/3864), [#3894](https://github.com/horovod/horovod/pull/3894), [#3906](https://github.com/horovod/horovod/pull/3906), [#3907](https://github.com/horovod/horovod/pull/3907))
- Fixed missing arguments of Keras allreduce function. ([3905](https://github.com/horovod/horovod/pull/3905))
- Updated with_device functions in MXNet and PyTorch to skip unnecessary cudaSetDevice calls. ([3912](https://github.com/horovod/horovod/pull/3912))

Page 1 of 5

Releases

Has known vulnerabilities

Intel-optimization-for-horovod

Page 1 of 5

0.28.1.4

0.28.1.3

0.28.1.2

0.28.1

0.28.1.0

0.28.0

Page 1 of 5

Links

Releases