Dpctl

Latest version: v0.19.0

Safety actively analyzes 724206 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 7

0.19.0

This release features official, out-of-the-box support for compiling `dpctl` for specified AMD GPU architectures, the addition of new function `tensor.top_k`, a radix-sort-based implementation of sorting functions, and improvements to interoperability with DLPack through `tensor.dldevice_to_sycl_device` and `tensor.sycl_device_to_dldevice`.

A number of adjustments were also made to improve performance of `dpctl` reductions (i.e., `sum`, `min`, `max`, etc.), accumulators (i.e., `cumulative_sum`, `cumulative_logsumexp`), and copy-and-cast operations.

Added

* Support for compiling `dpctl` for specified AMD GPU architecture with use of [CodePlay oneAPI plug-in](https://developer.codeplay.com/products/oneapi/amd/home/) [gh-1731](https://github.com/IntelPython/dpctl/pull/1731)
* Added `tensor.top_k` per Python Array API specification [gh-1921](https://github.com/IntelPython/dpctl/pull/1921)
* Added functions `tensor.dldevice_to_sycl_device` and `tensor.sycl_device_to_dldevice` for converting between DLPack and sycl devices, and a method `get_device_id` to `dpctl.SyclDevice` to improve interoperability with DLPack protocol [gh-1953](https://github.com/IntelPython/dpctl/pull/1953)
* Added `DPCTL_OFFLOAD_COMPRESS` cmake option (set to `OFF` by default) to toggle [--offload-compress](https://www.intel.com/content/www/us/en/developer/articles/technical/sycl-compilation-device-image-compression.html) linker option when building `dpctl` [gh-1961](https://github.com/IntelPython/dpctl/pull/1961)

Changed

* Improved performance of copy-and-cast operations from `numpy.ndarray` to `tensor.usm_ndarray` for contiguous inputs [gh-1829](https://github.com/IntelPython/dpctl/pull/1829)
* `py_sort` and `py_argsort` now throw `py::value_error` if inputs are not C-contiguous [gh-1838](https://github.com/IntelPython/dpctl/pull/1838)
* Improved performance of copying operation to C-/F-contig array, with optimization for batch of square matrices [gh-1850](https://github.com/IntelPython/dpctl/pull/1850)
* Improved performance of `tensor.argsort` function for all types [gh-1859](https://github.com/IntelPython/dpctl/pull/1859)
* Improved performance of `tensor.sort` and `tensor.argsort` for short arrays in the range [16, 64] elements [gh-1866](https://github.com/IntelPython/dpctl/pull/1866)
* Implemented radix sort algorithm to be used in `dpt.sort` and `dpt.argsort` [gh-1867](https://github.com/IntelPython/dpctl/pull/1867), [gh-1883](https://github.com/IntelPython/dpctl/pull/1883)
* Extended `dpctl.SyclTimer` with `device_timer` keyword, implementing different methods of collecting device times [gh-1872](https://github.com/IntelPython/dpctl/pull/1872)
* `dpctl` changed to see GPU devices out of the box in virtual environment on Windows [gh-1922](https://github.com/IntelPython/dpctl/pull/1922)
* Improved performance of `tensor.cumulative_sum`, `tensor.cumulative_prod`, `tensor.cumulative_logsumexp` as well as performance of boolean indexing [gh-1923](https://github.com/IntelPython/dpctl/pull/1923), [gh-1942](https://github.com/IntelPython/dpctl/pull/1942)
* Improved performance of `tensor.min`, `tensor.max`, `tensor.logsumexp`, `tensor.reduce_hypot` for floating point type arrays by at least 2x [gh-1932](https://github.com/IntelPython/dpctl/pull/1932), [gh-1937](https://github.com/IntelPython/dpctl/pull/1937)
* Updated Cython examples to use scikit-build [gh-1935](https://github.com/IntelPython/dpctl/pull/1935)
* Reduced binary size of `_tensor_accumulation_impl` by 13 MB [gh-1957](https://github.com/IntelPython/dpctl/pull/1957)
* Extended `tensor.asarray` to support objects that implement `__usm_ndarray__` property to be interpreted as `usm_ndarray` objects [gh-1959](https://github.com/IntelPython/dpctl/pull/1959)
* `tensor.usm_ndarray` object disallows implicit conversions to NumPy array [gh-1964](https://github.com/IntelPython/dpctl/pull/1964)
* `stream` arguments in `tensor.usm_ndarray` methods now raise an error if `stream` is not a `tensor.SyclQueue` [gh-1969](https://github.com/IntelPython/dpctl/pull/1969)
* `dpctl` initialization sets subprocess to use SPAWN method on Linux to enable `gdb-oneapi` to debug kernels submitted from Python applications [gh-1971](https://github.com/IntelPython/dpctl/pull/1971)
* Reduced binary size of `_tensor_elementwise_impl` [gh-1976](https://github.com/IntelPython/dpctl/pull/1976)
* Allow `dpctl.SyclQueue.memcpy` to and from multi-dimensional buffers [gh-1985](https://github.com/IntelPython/dpctl/pull/1985)

Fixed

* Fixed a bug in `tensor.roll` for very large values of `shift` [gh-1869](https://github.com/IntelPython/dpctl/pull/1869)
* Fix for `tensor.result_type` when all inputs are Python built-in scalars [gh-1877](https://github.com/IntelPython/dpctl/pull/1877)
* Improved error in constructors `tensor.full` and `tensor.full_like` when provided a non-numeric fill value [gh-1878](https://github.com/IntelPython/dpctl/pull/1878)
* Added a check for pointer alignment when copying to C-contiguous memory [gh-1890](https://github.com/IntelPython/dpctl/pull/1890), [gh-1891](https://github.com/IntelPython/dpctl/pull/1891)
* Fixed `dpctl` installed into virtual environment not finding DPC++ runtime libraries by adding `DPCTL_WITH_REDIST` cmake option (set to `OFF` by default) [gh-1893](https://github.com/IntelPython/dpctl/pull/1893)
* Fixed incorrect result (issue [gh-1901](https://github.com/IntelPython/dpctl/issues/1901)) in `tensor.cumulative_sum` and in advanced indexing [gh-1902](https://github.com/IntelPython/dpctl/pull/1902)
* Fixed `__setitem__()` for `tensor.usm_ndarray` when passed an empty boolean mask [gh-1915](https://github.com/IntelPython/dpctl/pull/1915)
* `tensor.from_dlpack` docstring now shows that return type can be NumPy array and stipulates when this will be the case [gh-1919](https://github.com/IntelPython/dpctl/pull/1919)
* Fixed docstring in helper class in DLPack tests [gh-1920](https://github.com/IntelPython/dpctl/pull/1920)
* Fixed a bug in `tensor.astype` where `copy=False` would not be respected for 1d arrays when order keyword is specified [gh-1928](https://github.com/IntelPython/dpctl/pull/1928)
* Replaced deprecated `CL/sycl.hpp` with recommended `sycl/sycl.hpp` in examples [gh-1933](https://github.com/IntelPython/dpctl/pull/1933)
* Fixed `tensor.take_along_axis` and `tensor.put_along_axis` raising an error for `tensor.uint64` indices when given an array of dimension greater than 1 [gh-1934](https://github.com/IntelPython/dpctl/pull/1934)
* Fixed unexpected results of `tensor.sum` with a requested output type of `bool` [gh-1958](https://github.com/IntelPython/dpctl/pull/1958)
* Use `std::move` to avoid unnecessary copying of temporary in `triul_ctor.cpp` [gh-1960](https://github.com/IntelPython/dpctl/pull/1960)
* Make `stream` a keyword-only argument in `tensor.usm_ndarray.to_device` per requirement by array API specification [gh-1966](https://github.com/IntelPython/dpctl/pull/1966)
* Improve efficiency of copy implementation and avoid an unnecessary kernel invocation in `tensor.argsort` for 1d input [gh-1967](https://github.com/IntelPython/dpctl/pull/1967)
* Corrected uses of NumPy constructors with `tensor.usm_ndarray` inputs in test suite [gh-1968](https://github.com/IntelPython/dpctl/pull/1968)
* Fixed array API namespace inspection utilities showing `complex128` as a valid dtype on devices without double precision and `device` keywords not working with `dpctl.SyclQueue` or filter strings [gh-1979](https://github.com/IntelPython/dpctl/pull/1979)
* Fixed a bug in `test_sycl_device_interface.cpp` which would cause compilation to fail with Clang version 20.0 [gh-1989](https://github.com/IntelPython/dpctl/pull/1989)
* Fixed memory leaks in smart-pointer-managed USM temporaries in synchronizing kernel calls [gh-2002](https://github.com/IntelPython/dpctl/pull/2002)
* `UsmNDArray_MakeSimpleFromPtr` and `UsmNDArray_MakeFromPtr` now raise an error when provided an invalid `typenum` before attempting to create the array [gh-2003](https://github.com/IntelPython/dpctl/pull/2003)
* Fixed typos in `tensor.from_numpy` and `tensor.astype` [gh-2006](https://github.com/IntelPython/dpctl/pull/2006)

Maintenance

* Revert pinning of cmake to 3.26 on Windows [gh-1823](https://github.com/IntelPython/dpctl/pull/1823)
* Update black version used in Python code style workflow [gh-1828](https://github.com/IntelPython/dpctl/pull/1828)
* Fixed CI/CD workflow for building conda packages on Windows [gh-1831](https://github.com/IntelPython/dpctl/pull/1831)
* Revert work-around in `test_sycl_kernel_submit.py` for problem in MKL 2024.2.0 [gh-1836](https://github.com/IntelPython/dpctl/pull/1836)
* Do not use Mambaforge variant of miniforge as deprecated [gh-1844](https://github.com/IntelPython/dpctl/pull/1844)
* Use pybind11=2.13.6 [gh-1845](https://github.com/IntelPython/dpctl/pull/1845)
* Remove unnecessary include in C++ header file [gh-1846](https://github.com/IntelPython/dpctl/pull/1846)
* Build translation unit "simplify_iteration_space.cpp" compiled multiple times as a static library [gh-1847](https://github.com/IntelPython/dpctl/pull/1847)
* Add instructions for installing `dpctl` from Intel PyPi channel [gh-1860](https://github.com/IntelPython/dpctl/pull/1860)
* Fix warnings when generating docs [gh-1855](https://github.com/IntelPython/dpctl/pull/1855), [gh-1861](https://github.com/IntelPython/dpctl/pull/1861)
* Align conda recipe with conda-forge's `{{ stdlib("c") }}` migration [gh-1868](https://github.com/IntelPython/dpctl/pull/1868)
* Add missing include of SYCL header to "math_utils.hpp" [gh-1899](https://github.com/IntelPython/dpctl/pull/1899)
* Add support of CV-qualifiers in `is_complex<T>` helper [gh-1900](https://github.com/IntelPython/dpctl/pull/1900)
* Tuning work for elementwise functions with modest performance gains (under 10%) [gh-1889](https://github.com/IntelPython/dpctl/pull/1889)
* Reduce binary size of accumulators by saving repeated expressions to a temporary [gh-1896](https://github.com/IntelPython/dpctl/pull/1896)
* Added workflow to run nightly tests of `dpctl` [gh-1903](https://github.com/IntelPython/dpctl/pull/1903), [gh-1905](https://github.com/IntelPython/dpctl/pull/1905)
* Support and testing for Python 3.13 for `dpctl` [gh-1941](https://github.com/IntelPython/dpctl/pull/1941), [gh-1943](https://github.com/IntelPython/dpctl/pull/1943)
* Change libtensor to use `std::size_t` and `dpctl::tensor::ssize_t` throughout and fix missing includes for `std::size_t` and `size_t` [gh-1950](https://github.com/IntelPython/dpctl/pull/1950)
* Fixed some unqualified `size_t` and fixed-width integral types in `libtensor` [gh-1955](https://github.com/IntelPython/dpctl/pull/1955)
* Add versioneer as a build requirement in documentation on building `dpctl` from source [gh-1972](https://github.com/IntelPython/dpctl/pull/1972)
* Remove const qualifiers for class and struct members [gh-1974](https://github.com/IntelPython/dpctl/pull/1974), [gh-1975](https://github.com/IntelPython/dpctl/pull/1975)
* Various code quality improvements to `test_sycl_queue_submit_local_accessor_arg.cpp` [gh-1990](https://github.com/IntelPython/dpctl/pull/1990)
* Added Python 3.12 to package metadata [gh-2005](https://github.com/IntelPython/dpctl/pull/2005)
* Miscellaneous changes to continuous integration/delivery (CI/CD) supporting scripts:
[gh-1837](https://github.com/IntelPython/dpctl/pull/1837),
[gh-1839](https://github.com/IntelPython/dpctl/pull/1839),
[gh-1848](https://github.com/IntelPython/dpctl/pull/1848),
[gh-1853](https://github.com/IntelPython/dpctl/pull/1853),
[gh-1854](https://github.com/IntelPython/dpctl/pull/1854),
[gh-1856](https://github.com/IntelPython/dpctl/pull/1856),
[gh-1858](https://github.com/IntelPython/dpctl/pull/1858),
[gh-1863](https://github.com/IntelPython/dpctl/pull/1863),
[gh-1864](https://github.com/IntelPython/dpctl/pull/1864),
[gh-1865](https://github.com/IntelPython/dpctl/pull/1865),
[gh-1881](https://github.com/IntelPython/dpctl/pull/1881),
[gh-1882](https://github.com/IntelPython/dpctl/pull/1882),
[gh-1884](https://github.com/IntelPython/dpctl/pull/1884),
[gh-1886](https://github.com/IntelPython/dpctl/pull/1886),
[gh-1888](https://github.com/IntelPython/dpctl/pull/1888),
[gh-1897](https://github.com/IntelPython/dpctl/pull/1897),
[gh-1898](https://github.com/IntelPython/dpctl/pull/1898),
[gh-1909](https://github.com/IntelPython/dpctl/pull/1909),
[gh-1916](https://github.com/IntelPython/dpctl/pull/1916),
[gh-1927](https://github.com/IntelPython/dpctl/pull/1927),
[gh-1940](https://github.com/IntelPython/dpctl/pull/1940),
[gh-1948](https://github.com/IntelPython/dpctl/pull/1948),
[gh-1949](https://github.com/IntelPython/dpctl/pull/1949),
[gh-1952](https://github.com/IntelPython/dpctl/pull/1952),
[gh-1962](https://github.com/IntelPython/dpctl/pull/1962),
[gh-1963](https://github.com/IntelPython/dpctl/pull/1963),
[gh-1973](https://github.com/IntelPython/dpctl/pull/1973),
[gh-1980](https://github.com/IntelPython/dpctl/pull/1980),
[gh-1981](https://github.com/IntelPython/dpctl/pull/1981),
[gh-1983](https://github.com/IntelPython/dpctl/pull/1983),
[gh-1988](https://github.com/IntelPython/dpctl/pull/1988),

0.18.3

Fixed

* Enabled `dpctl` in virtual environment on Windows platform (issue [gh-1745](https://github.com/IntelPython/dpctl/issues/1745)) [gh-1924](https://github.com/IntelPython/dpctl/pull/1924)

0.18.2

Maintenance

* Add missing include of SYCL header to "math_utils.hpp" [gh-1899](https://github.com/IntelPython/dpctl/pull/1899)

Fixed

* Fix for `tensor.result_type` when all inputs are Python built-in scalars [gh-1904](https://github.com/IntelPython/dpctl/pull/1904)

0.18.1

Changed

* Updated installation instructions [gh-1862](https://github.com/IntelPython/dpctl/pull/1862)

0.18.0

This release reaches an important milestone by making offloading fully asynchronous.
Calls to `dpctl.tensor` submit tasks for execution to DPC++ runtime and return without waiting for execution of these tasks to finish.
The sequential semantics a user comes to expect from execution of Python script is preserved though.

The full list of changes that went into this release are:

Added

* Implement `tensor.take_along_axis` per Python Array API specification [gh-1778](https://github.com/IntelPython/dpctl/pull/1778)
* Implement `tensor.put_along_axis` to complement `tensor.take_along_axis` [gh-1798](https://github.com/IntelPython/dpctl/pull/1798)
* Support for 'device=tensor.kDLCPU' in `tensor.from_dlpack` function and `tensor.usm_ndarray.__dlpack__` method [gh-1781](https://github.com/IntelPython/dpctl/pull/1781)
* Support DLPack on Windows [gh-1746](https://github.com/IntelPython/dpctl/pull/1746)
* Implement `tensor.nextafter` function per Python Array API specification [gh-1730](https://github.com/IntelPython/dpctl/pull/1730)
* Implement `tensor.count_nonzero` and `tensor.diff` functions from Python array API specification [gh-1732](https://github.com/IntelPython/dpctl/pull/1732), [gh-1780](https://github.com/IntelPython/dpctl/pull/1780)
* Add support for `order="K"` to `*_like` array creation functions, and change default `order` keyword value from `'C'` to `'K'` [gh-1808](https://github.com/IntelPython/dpctl/pull/1808)
* Support for 'max dimensions' in Array API capabilities info data [gh-1774](https://github.com/IntelPython/dpctl/pull/1774)
* Add support for device aspect 'emulated' [gh-1691](https://github.com/IntelPython/dpctl/pull/1691)
* `dpctl::tensor::usm_memory` class defined in `dpctl4pybind11.hpp` adds constructor to create Python USM memory objects viewing into existing USM allocations, which can be made by an external library [gh-1782](https://github.com/IntelPython/dpctl/pull/1782)
* Add support for COVERAGE build type in project's CMake script [gh-1692](https://github.com/IntelPython/dpctl/pull/1692)

Changed

* Change ownership of USM allocation by `dpctl.memory` objects, make executions of `dpctl.tensor` operations asynchronous [gh-1705](https://github.com/IntelPython/dpctl/pull/1705)
* Add support for Python scalars by `tensor.where` function [gh-1719](https://github.com/IntelPython/dpctl/pull/1719)
* Optimize division by Python scalar in statistical functions `tensor.mean`, `tensor.std`, `tensor.var` [gh-1820](https://github.com/IntelPython/dpctl/pull/1820)
* Use transcendental functions from `sycl` namespace instead of `std` namespace [gh-1707](https://github.com/IntelPython/dpctl/pull/1707)
* Changes for compatibility with recent NumPy in runtime environment [gh-1735](https://github.com/IntelPython/dpctl/pull/1735), [gh-1772](https://github.com/IntelPython/dpctl/pull/1772), [gh-1804](https://github.com/IntelPython/dpctl/pull/1804)
* Array creation function `tensor.zeros` to use asynchronous `memset` operation [gh-1806](https://github.com/IntelPython/dpctl/pull/1806)
* The setter of `tensor.usm_ndarray.shape` property now supports Python scalar value [gh-1786](https://github.com/IntelPython/dpctl/pull/1786)
* Use 'pyproject.toml' instead of 'setup.py' aligning with current packaging best practices [gh-1660](https://github.com/IntelPython/dpctl/pull/1660)
* No longer set SOVERSION property in DPCTLSyclInterface library on Linux [gh-1773](https://github.com/IntelPython/dpctl/pull/1773)
* Update version of 'pybind11' used [gh-1758](https://github.com/IntelPython/dpctl/pull/1758), [gh-1812](https://github.com/IntelPython/dpctl/pull/1812)
* Handle possible exceptions by `usm_host_allocator` used with `std::vector` [gh-1791](https://github.com/IntelPython/dpctl/pull/1791)
* Use `dpctl::tensor::alloc_utils::sycl_free_noexcept` instead of `sycl::free` in `host_task` tasks associated with life-time management of temporary USM allocations [gh-1797](https://github.com/IntelPython/dpctl/pull/1797)
* Add `"same_kind"`-style casting for in-place mathematical operators of `tensor.usm_ndarray` [gh-1827](https://github.com/IntelPython/dpctl/pull/1827), [gh-1830](https://github.com/IntelPython/dpctl/pull/1830)

Fixed

* Fix setting of release variable Sphinx config file [gh-1685](https://github.com/IntelPython/dpctl/pull/1685)
* Handle possible NULL return value from device aspect queries `DPCTLDevice_GetMaxWorkGroupSize1d` and `DPCTLDevice_GetMaxWorkGroupSize2d` [gh-1690](https://github.com/IntelPython/dpctl/pull/1690)
* Add license header to conda script files [gh-1695](https://github.com/IntelPython/dpctl/pull/1695)
* Fix `tensor.round` behavior on CUDA devices [gh-1700](https://github.com/IntelPython/dpctl/pull/1700)
* Add missing `include <sstream>` [gh-1701](https://github.com/IntelPython/dpctl/pull/1701)
* Fix for issue 1724 [gh-1728](https://github.com/IntelPython/dpctl/pull/1728)
* Correct USM type for return array of `tensor.extract` function [gh-1727](https://github.com/IntelPython/dpctl/pull/1727)
* Fix for `tensor.unique_all` and `tensor.unique_inverse` to always return index arrays with default indexing data type [gh-1741](https://github.com/IntelPython/dpctl/pull/1741)
* Propagate read-only flag from `__sycl_usm_array_interface__` in `tensor.asarray` function [gh-1756](https://github.com/IntelPython/dpctl/pull/1756)
* `tensor.clip` to handle Python scalars which are out of bound for the data type of integral array [gh-1759](https://github.com/IntelPython/dpctl/pull/1759)
* Avoid dead-locking by releasing GIL around blocking operations in libtensor [gh-1753](https://github.com/IntelPython/dpctl/pull/1753)
* Element-wise `tensor.divide` and comparison operations allow greater range of Python integer and integer array combinations [gh-1771](https://github.com/IntelPython/dpctl/pull/1771)
* Fix for unexpected behavior when using floating point types for array indexing [gh-1792](https://github.com/IntelPython/dpctl/pull/1792)
* Enable `pytest --pyargs dpctl.tests` [gh-1833](https://github.com/IntelPython/dpctl/pull/1833)
* Fix for undefined behavior in indexing using integer arrays [gh-1894](https://github.com/IntelPython/dpctl/pull/1894)

Maintenance

* Improve performance of `test_sort_complex_fp_nan` [gh-1704](https://github.com/IntelPython/dpctl/pull/1704)
* Improve exception wording raised by `tensor.broadcast_arrays()` [gh-1720](https://github.com/IntelPython/dpctl/pull/1720)
* Remove `template` keyword in method call of `sycl::kernel_bundle` [gh-1726](https://github.com/IntelPython/dpctl/pull/1726)
* Backport changelog edits from maintenance/0.17.x [gh-1736](https://github.com/IntelPython/dpctl/pull/1736)
* Replace uses of 'intel' channels in docs and readme file [gh-1737](https://github.com/IntelPython/dpctl/pull/1737)
* Update references to deprecated environment variable `SYCL_DEVICE_FILTER` [gh-1740](https://github.com/IntelPython/dpctl/pull/1740)
* Correction for installation instruction steps [gh-1754](https://github.com/IntelPython/dpctl/pull/1754)
* Fix for crash during testing with open source SYCL bundle by updating CPU RT library used [gh-1762](https://github.com/IntelPython/dpctl/pull/1762)
* Add missing include to fix build break with newer LLVM [gh-1776](https://github.com/IntelPython/dpctl/pull/1776)
* Add `include <utility>` for definition of `std::move` used [gh-1787](https://github.com/IntelPython/dpctl/pull/1787)
* Change to CMake script to accomodate DPC++ transition from PI to UR architecture [gh-1788](https://github.com/IntelPython/dpctl/pull/1788)
* Document `tensor._flags.Flags` class [gh-1794](https://github.com/IntelPython/dpctl/pull/1794)
* Fix for unreferenced unreleased bug in copy-and-cast code logic [gh-1799](https://github.com/IntelPython/dpctl/pull/1799)
* Explicitly include headers used in C++ translation units implementing reduction operations [gh-1802](https://github.com/IntelPython/dpctl/pull/1802)
* Clean-up uses of `Strided1DIndexer` class [gh-1805](https://github.com/IntelPython/dpctl/pull/1805)
* Tweak to readability of C++ code implementing matrix-matrix multiplication [gh-1810](https://github.com/IntelPython/dpctl/pull/1810)
* Do not add `sycl::event` associated with compute task to vector of events representing execution of `host_task` [gh-1807](https://github.com/IntelPython/dpctl/pull/1807)
* Remove 'level-zero' conda package from run-time dependencies of 'dpctl' since Intel GPU driver stack now explicitly depends on `libze1` package which provides Level-Zero loader library [gh-1801](https://github.com/IntelPython/dpctl/pull/1801), [gh-1840](https://github.com/IntelPython/dpctl/pull/1840)
* Use dedicated type-support matrices for in-place element-wise binary operations [gh-1816](https://github.com/IntelPython/dpctl/pull/1816)
* Remove recommendation to install wheels from Anaconda PyPI index [gh-1819](https://github.com/IntelPython/dpctl/pull/1819)
* Removed use of post-link and pre-unlink conda scripts in `dpctl` [gh-1821](https://github.com/IntelPython/dpctl/pull/1821)
* Pin compiler used to build 0.18.0 version to 2025.0.0 [gh-1822](https://github.com/IntelPython/dpctl/pull/1822)
* A varienty of changes to continuous integration/delivery (CI/CD) supporting scripts to keep CI running smoothly:
[gh-1686](https://github.com/IntelPython/dpctl/pull/1686),
[gh-1688](https://github.com/IntelPython/dpctl/pull/1688),
[gh-1697](https://github.com/IntelPython/dpctl/pull/1697),
[gh-1698](https://github.com/IntelPython/dpctl/pull/1698),
[gh-1703](https://github.com/IntelPython/dpctl/pull/1703),
[gh-1702](https://github.com/IntelPython/dpctl/pull/1702),
[gh-1709](https://github.com/IntelPython/dpctl/pull/1709),
[gh-1712](https://github.com/IntelPython/dpctl/pull/1712),
[gh-1713](https://github.com/IntelPython/dpctl/pull/1713),
[gh-1722](https://github.com/IntelPython/dpctl/pull/1722),
[gh-1725](https://github.com/IntelPython/dpctl/pull/1725),
[gh-1729](https://github.com/IntelPython/dpctl/pull/1729),
[gh-1733](https://github.com/IntelPython/dpctl/pull/1733),
[gh-1721](https://github.com/IntelPython/dpctl/pull/1721),
[gh-1743](https://github.com/IntelPython/dpctl/pull/1743),
[gh-1739](https://github.com/IntelPython/dpctl/pull/1739),
[gh-1747](https://github.com/IntelPython/dpctl/pull/1747),
[gh-1748](https://github.com/IntelPython/dpctl/pull/1748),
[gh-1750](https://github.com/IntelPython/dpctl/pull/1750),
[gh-1752](https://github.com/IntelPython/dpctl/pull/1752),
[gh-1767](https://github.com/IntelPython/dpctl/pull/1767),
[gh-1768](https://github.com/IntelPython/dpctl/pull/1768),
[gh-1775](https://github.com/IntelPython/dpctl/pull/1775),
[gh-1783](https://github.com/IntelPython/dpctl/pull/1783),
[gh-1790](https://github.com/IntelPython/dpctl/pull/1790),
[gh-1795](https://github.com/IntelPython/dpctl/pull/1795),
[gh-1796](https://github.com/IntelPython/dpctl/pull/1796),
[gh-1800](https://github.com/IntelPython/dpctl/pull/1800),
[gh-1760](https://github.com/IntelPython/dpctl/pull/1760),
[gh-1803](https://github.com/IntelPython/dpctl/pull/1803),
[gh-1777](https://github.com/IntelPython/dpctl/pull/1777),
[gh-1813](https://github.com/IntelPython/dpctl/pull/1813),
[gh-1817](https://github.com/IntelPython/dpctl/pull/1817),
[gh-1818](https://github.com/IntelPython/dpctl/pull/1818)

0.17.0

This release features updated documentation web-page https://intelpython.github.io/dpctl/latest/index.html, adds cumulative reductions,
and complies with revision [2023.12](https://data-apis.org/array-api/2023.12/) of Python Array API specification.

Added

* Added pybind11 caster for ``sycl::half`` to map to/from Python `float` to ``"dpctl4pybind11.hpp"`` header: [gh-1655](https://github.com/IntelPython/dpctl/pull/1655)
* Added support for DLPack data interchange per Python Array API 2023.12 specification: [gh-1667](https://github.com/IntelPython/dpctl/pull/1667)
* Implemented `tensor.cumulative_sum`, `tensor.cumulative_prod` and `tensor.cumulative_logsumexp`: [gh-1602](https://github.com/IntelPython/dpctl/pull/1602)

Changed

* Expanded documentation for `dpctl`: [gh-1619](https://github.com/IntelPython/dpctl/pull/1619)
* Expanded `utils.intel_device_info` functionality: [gh-1656](https://github.com/IntelPython/dpctl/pull/1656)
* Improved performance of elementwise operations: [gh-1651](https://github.com/IntelPython/dpctl/pull/1651)
* Efficiency improvement by avoiding unnecessary copying of ``sycl::queue``: [gh-1645](https://github.com/IntelPython/dpctl/pull/1645)
* `dpctl` uses pybind11 2.12.0: [gh-1640](https://github.com/IntelPython/dpctl/pull/1640)
* Improved performance of `tensor.reshape` operation with `order="F"` when copying is needed, or requested: [gh-1677](https://github.com/IntelPython/dpctl/pull/1677)

Fixed

* Fixed initialization of byte type constants in `dpctl_capi` Python/C API loader class in `"dpctl4pybind11.hpp"`: [gh-1665](https://github.com/IntelPython/dpctl/pull/1665)
* Fixed crash in `tensor.sort` reported for a CPU device and a CUDA device: [gh-1676](https://github.com/IntelPython/dpctl/pull/1676)
* Fixed race condition in accumulation kernel for custom operations that caused test failures with AMD CPUs: [gh-1624](https://github.com/IntelPython/dpctl/pull/1624)
* Fixed comparison operators for mixed signed and unsigned integral types: [gh-1650](https://github.com/IntelPython/dpctl/pull/1650)
* Support use of index arrays of different integral types in indexing operations: [gh-47](https://github.com/IntelPython/dpctl/pull/1647)
* Fixed source code to compile for NVidia(TM) GPUs with DPC++ 2024.1: [gh-1630](https://github.com/IntelPython/dpctl/pull/1630)
* Corrected `tensor.tile` for scalar inputs and empty repetitions: [gh-1628](https://github.com/IntelPython/dpctl/pull/1628)
* Fixed support for `out` keyword in `tensor.matmul`: [gh-1610](https://github.com/IntelPython/dpctl/pull/1610)
* Fixed bug in basic slicing of empty arrays: [gh-1680](https://github.com/IntelPython/dpctl/pull/1680)
* Fixed bug in `tensor.bitwise_invert` for boolean input array: [gh-1681](https://github.com/IntelPython/dpctl/pull/1681)
* Fixed bug in `tensor.repeat` on zero-size input arrays: [gh-1682](https://github.com/IntelPython/dpctl/pull/1682)
* Fixed bug in `tensor.searchsorted` for 0d needle vector and strided hay: [gh-1694](https://github.com/IntelPython/dpctl/pull/1694)

Page 1 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.