This is the release note of v8.0.0rc1. See [here](https://github.com/cupy/cupy/milestone/78?closed=1) for the complete list of solved issues and merged PRs.
We are planning to release the final v8.0.0 on October 1st. Please start testing your workload with this release. See the [Upgrade Guide](https://docs.cupy.dev/en/v8.0.0rc1/upgrade.html#cupy-v8) for the list of possible breaking changes.
Highlights
* This release adds support for CUDA 11, NumPy 1.19, and SciPy 1.5.
* Several performance improvements when using cuTENSOR, sparse matrices indexing, matrix multiplication with CUDA 11 using TF32.
* Compatibility with `numpy.poly` is being increased thanks to our GSoC student Dahlia-Chehata!
* Added an interface (3126) to support using external memory allocators such as the PyTorch one (https://github.com/pytorch/pytorch/pull/33860).
Notes on Wheel Packages
* Update on 2020-09-23: `cupy-cuda110` package is now available on PyPI! ~CuPy for CUDA 11.0 (`cupy-cuda110`) wheel packages are currently available only for Windows. We are going to publish Linux wheels once we get [approval](https://github.com/pypa/pypi-support/issues/553) from the PyPI team. (Meanwhile, Linux wheels can be downloaded from the Assets section below (or `pip install cupy-cuda110 -f https://github.com/cupy/cupy/releases/tag/v8.0.0rc1`). Those wheels will be removed once we publish the package on PyPI.)~
* CuPy for CUDA 10.1 (`cupy-cuda101`), 10.2 (`cupy-cuda102`), and 11.0 (`cupy-cuda110`) packages are built with cuDNN v8 support but without bundled cuDNN shared libraries (see 3724 for the discussion). To use cuDNN features, You need to download cuDNN library using the following command: `python -m cupyx.tools.install_library --library cudnn --cuda X.X`.
It is also possible to install cuDNN v8.0.x via the system package manager (e.g., `apt install libcudnn8` or `yum install libcudnn8`) or manually install it and set `LD_LIBRARY_PATH` environment variables.
Changes without compatibility
Deprecate `cupy.sparse` package (3839, 3856)
CuPy's sparse matrix support was initially implemented in the `cupy.sparse` package. It was moved to the `cupyx.scipy.sparse` namespace in CuPy v5, while keeping the `cupy.sparse` one for backward compatibility.
Since there is no equivalent package in NumPy, it was decided that it will be deprecated and
eventually removed.
Deprecate `*_enabled` flags under `cupy.cuda` (3732)
Before it was possible to use `cupy.cuda.nccl_enabled` or similar to detect whether NCCL, cuTENSOR or other optional CUDA libraries are available to use. Now this pull-request introduced a per-module flag (`cupy.cuda.nccl.available`, `cupy.cuda.cutensor.available`) to obtain the same information.
Bump version in Docker images (3733)
The current base Docker images have been updated from Ubuntu 16.04, CUDA 9.2, and Python 3.5 to Ubuntu 18.04, CUDA 10.2, and Python 3.6.
New Features
- Add `cupy.ndim` (3060)
- Add `PythonFunctionAllocator` (3126)
- Compressed Sparse Inner Indexing (3486)
- Add `cupy.polyadd` (3548)
- Add `cupy.polymul` (3590)
- Add `cupy.polysub` (3593)
- Add most of `scipy.linalg.special_matrices` (3641)
- Add `scipy.signal` functions that are simple wrappers of `ndimage` functions (3645)
- Add `cupyx.scipy.ndimage.fourier_shift`, `fourier_gaussian`, `fourier_uniform` (3654)
- Add 2D Sparse Slicing (3657)
- Add 2D Sparse Slicing + Row Indexing (3658)
- Add 2D Sparse Slicing + Row & Column Indexing (3659)
- Add `cupy.roots` for Hermitian or symmetric matrix (3703)
- Add `cupy.polyval` (3725)
- Support `__cuda_array_interface__` in `cupy.poly1d` (3729)
- Implement library preloading for wheels (3731)
- Add `cupy.poly1d.__pow__` (3734)
- Add `scipy.signal.convolve` and `correlate` functions (3748)
- Add `trimcoef` (3793)
Enhancements
- Avoid disk I/O in compiler (3164)
- Add check for method in Randomstate seed (3282)
- Support negative `axis` in sparse `min`/`max`/`argmin`/`argmax` (3497)
- Mark `nonzero` parameters experimental in sparse `min`/`max` (3583)
- Add a `compile` method for `RawKernel` and `RawModule` (3644)
- Handle `__cuda_array_interface__` in `asnumpy` (3718)
- Use `cublasGemmEx` in `tensordot_core` when CUDA11 (3719)
- Deprecate `*_enabled` flags under `cupy.cuda` (3732)
- Fix handle types to `intptr_t` (3746)
- Support TF32 (3810)
- Deprecate `cupy.sparse` package (3839)
- Add `path` and `readonly` options to `cupyx.optimizing.optimize` (3845)
- Adding a workaround for even-length inputs to `scipy.signal.sepfir2d` (3750)
- Add multi-axis support to `cupy.flip` (3742)
Performance Improvements
- Speed up `cupy.vdot` (3678)
- Improve `cupy.cutensor` (3700)
- More improvement of `cupy.cutensor` (3744)
- Improve 2D sparse row slicing (3782)
- Improve median_filter, rank_filter and percentile_filter (3813)
- Improve CSR matrix `getrow`, `getcol` and some slicing (3851)
Bug Fixes
- Fix `float16` `ndarray` input in `histogram` with CUB (3617)
- Support order argument in `cupy.ones`, `cupy.full` and `cupy.eye` (3655)
- Work around a known CUB SpMV bug (3679)
- Fix broken message format (3691)
- Fix `can_use_device_segmented_reduce()` for incompatible axes (3740)
- Fix circular imports (3743)
- Skip FFT input checks for some CUDA >= 10.1 cases (3763)
- Fix CUDA 11 multi-GPU FFT bug (3775)
- Temporary fixes for cudnn v8 (3790)
- Fix `cupy.correlate` (3801)
- Copy input by default for C2R transform (3848)
- Fix `cupy.sparse.*` deprecation (3856)
- Fix cub not bundled in wheels (3879)
- Fix wheel not loading bundled cuDNN on Windows (3880)
- Add option to include wheel metadata (3881)
- Fix not to use `cupy.cuda.*` from CuPy codebase (3883)
Code Fixes
- Add `cupy_backends/cuda/libs/cutensor.pxd` (3595)
- Refactor `_make_decorator` in helper.py (3697)
- Refactor `cupy.poly1d` tests (3704)
- Remove unnecessary imports in `cupy._sorting` (3706)
- Rename `cupy.binary` submodule to `cupy._binary` (3707)
- Rename `cupy.creation` submodule to `cupy._creation` (3708)
- Rename `cupy.functional` submodule to `cupy._functional` (3710)
- Rename `cupy.indexing` submodule to `cupy._indexing` (3711)
- Remove unnecessary imports of `cupy.linalg` (3714)
- Rename `cupy.misc` submodule to `cupy._misc` (3726)
- Rename `cupy.padding` submodule to `cupy._padding` (3727)
- Rename submodules under `cupy.random` package (3772)
- Refactor logical routines from `core.pyx` (3804)
- Refactor binary-op routines from `core.pyx` (3816)
- Fix typo (3850)
- Resolve circular imports between `cupy` and `cupyx.scipy` (3854)
Documentation
- Correct format of docstrings in creation routines (3752)
- Update docs for v8 (3802)
- Fix a broken document (3807)
- Add `cupy-cuda110` package to README (3817)
- Fix documents to reflect `CUPY_ACCELERATORS` (3818)
- Support Optuna v2 (install docs) (3842)
- Add upgrade guide for v8 (3863)
- Fix broken link in the installation guide (3864)
Installation
- Bump version in Docker images (3733)
- Update `classifiers` in `setup.py` (3814)
- Install SciPy and Optuna to Docker image (3844)
Tests
- Fix wrong test file name (3722)
- Fix test to run without NCCL (3735)
- Avoid mutation of `os.environ` (3749)
- Relax tolerance in `TestArrayElementwiseOp::test_doubly_broadcasted_pow` (3758)
- More on using `unittest.mock` (3791)
- Fix test to run without cuDNN (3846)
Others
- Bump version to v8.0.0rc1 (3882)
- Make nvrtc `getPTX` use `bytes` instead of `unicode` (3237)
- Add hiprtc support (3238)
- Fix build and import errors for ROCm (3786)
Contributors
The CuPy Team would like to thank all those who contributed to this release!
anaruse, cjnolet, coderforlife, Dahlia-Chehata, jakirkham, leofang, niteya-shah, pentschev