Deepsparse

Latest version: v1.8.0

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Page 5 of 7

0.11.0

New Features:
* High-performance sparse quantized convolutional neural networks supported on AVX2 systems.
* CCX detection added to the DeepSparse Engine for AMD systems.
* [`deepsparse.server`](https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/server/main.py) integration and CLIs added with Hugging Face transformers pipelines support.

Changes:
Performance improvements made for
* FP32 sparse BERT models
* batch size 1 networks
* quantized sparse BERT models
* Pooling operations

Resolved Issues:
* When hyperthreads are disabled in the BIOS, core/socket information on certain systems can now be detected.
* Hugging Face transformers validation flows for QQP now giving correct accuracy metrics.
* PyTorch downloaded for YOLO model stubs now supported.

Known Issues:
* When running NanoDet-Plus-m, the DeepSparse Engine will fail with an assertion (See 279). A hotfix is being pursued.

0.10.0

New Features:
* Quantization support enabled on AVX2 instruction set for GEMM and elementwise operations.
* `NM_SPOOF_ARCH` environment variable added for testing different architectural configurations.
* [Elastic scheduler implemented](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/scheduler.md) as an alternative to the single-stream or multi-stream schedulers.
* [`deepsparse.benchmark`](https://github.com/neuralmagic/deepsparse/blob/main/src/deepsparse/benchmark_model/README.md) application is now usable from the command-line after installing deepsparse to simplify benchmarking.
* ` deepsparse.server` CLI and API added with transformers support to make serving models like BERT with pipelines easy.

Changes:
* More robust architecture detection added to help resolve CPU topology, such as when running inside a virtual machine.
* Tensor columns improved, leading to significant speedups from 5 to 20% in pruned YOLO (larger batch size), BERT (smaller batch size), MobileNet, and ResNet models.
* Sparse quantized network performance improved on machines that do not support VNNI instructions.
* Performance improved for dense BERT with large batch sizes.

Resolved Issues:
* Possible crashes eliminated for:
* Pooling operations with small image sizes
* Rarely, networks containing convolution or GEMM operations
* Some models with many residual connections

Known Issues:
* None

0.9.1

This is a patch release for 0.9.0 that contains the following changes:

1. YOLACT models and other models with constant outputs no longer fail with a mismatched shape error on multi-socket systems with batch sizes greater than 1. However, a corner case exists where a model with a constant output whose first dimension is equal to the (nonunit) batch size will fail.
2. GEMM operations where the number of columns of the output matrix is not divisible by 16 will no longer fail with an assertion error.
3. Broadcasted inputs to elementwise operators no longer fail with an assertion error.
4. Int64 multiplications no longer fail with an illegal instruction on AVX2.

0.9.0

New Features:
* Support optimized for resize operators with coordinate transformations of pytorch_half_pixel and align_corners.
* Up-to-date version check implemented for DeepSparse.
* YOLACT and DeepSparse integration added in [examples/dbolya-yolact](https://github.com/neuralmagic/deepsparse/tree/main/examples/dbolya-yolact).

Changes:
* The parameter for the number of sockets to use has been removed -- the Python interface now only takes only the number of cores as a parameter.
* Tensor columns have been optimized. Users will see performance improvements specifically for pruned quantized BERT models:
- The softmax operator can now take advantage of tensor columns.
- Inference batch sizes that are not divisible by 16 are now supported.
* Various performance improvements made to:
- certain networks, such as YOLOv5, on AVX2 systems.
- dense convolutions on some AVX-512 systems.
* API docs recompiled.

Resolved Issues:
* In rare circumstances, users could have experienced an assertion error when executing networks with depthwise convolutions.

Known Issues:

* YOLACT models fail with a mismatched shape error on multi-socket systems with batch size greater than 1. This issue applies to any model with a constant output.
* In some circumstances, GEMM operations where the number of columns of the output matrix is not divisible by 16 may fail with an assertion error.

0.8.0

New Features:

* Tensor columns have been optimized, improving the performance of some networks.
- This includes but is not limited to pruned and quantized YOLOv5s and BERT.
- For networks with subgraphs comprised of low-compute operations.
- Batch size must be a multiple of 16.
* Reduce operators have been further optimized in the Engine.
* [C++ API support is available](https://github.com/neuralmagic/deepsparse/blob/main/docs/source/c%2B%2Bapi-overview.md) for the DeepSparse Engine.

Changes:

* Performance improvements made for low-precision (8 and 16-bit) datatypes on AVX2.

Resolved Issues:

* Rarely, when several data arrangement operators were in a row, e.g., Reshape, Transpose, or Slice, assertion errors occurred.
* When Pad operators were not followed by convolution or pooling, assertion errors occurred.
* CPU threads migrated between cores when running benchmarks.

Known Issues:

* None

0.7.0

New Features:

* Operators optimized for Engine support:
* Where*
* Cast*
* IntegerMatMul*
* QLinearMatMul*
* Gather (for scalar indices)
*_optimized only for AVX-512 support_
* Flag created to disable any batch size overrides, setting the environment variable "NM_DISABLE_BATCH_OVERRIDE=1".
* Warnings display when emulating quantized operations on machines without VNNI instructions.
* Support added for Python 3.9.
* Support added for ONNX versions 1.8 - 1.10.

Changes:

* Performance improvements made for sparse quantized transformer models.
* Documentation updates made for examples/ultralytics-yolo to include YOLOv5.

Resolved Issues:

* A crash could result with an uninitialized memory read. A check is now in place before trying to access it.
* Engine output_shape functions corrected on multi-socket systems when the output dimensions are not statically known.

Known Issues:

* BERT models with quantized embeds currently segfault on AVX2 machines. Workaround is to run on a VNNI-compatible machine.

Page 5 of 7

Releases

Has known vulnerabilities

Previous Next

Deepsparse

Page 5 of 7

0.11.0

0.10.0

0.9.1

0.9.0

0.8.0

0.7.0

Page 5 of 7

Links

Releases