Deepsparse

Latest version: v1.8.0

Safety actively analyzes 715032 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 7

1.0.0

New Features:
* Support added for running multiple models with the same engine when using the Elastic Scheduler.
* When using the Elastic Scheduler, the caller can now use the `num_streams` argument to tune the number of requests that are processed in parallel.
* Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.
* Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.
* AWS SageMaker example created.

Changes:
* Click as a root dependency added as the new preferred route for CLI invocation and arg management.

Performance:
* Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.

Resolved Issues:
* When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.
* Assertion error addressed for Reduce operations where the reduction axis is of length 1.
* Rare assertion failure addressed related to Tensor Columns.
* When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.

Known Issues:
* In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.
* The engine will crash with an assertion failure when setting the `num_streams` parameter to fewer than the number of NUMA nodes; hotfix forthcoming.
* In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.

0.12.2

This is a patch release for 0.12.0 that contains the following changes:

- Protobuf is restricted to version < 4.0 as the newer version breaks ONNX.

0.12.1

This is a patch release for 0.12.0 that contains the following changes:

- Improper label mapping no longer crashes for validation flows within DeepSparse transformers.
- DeepSparse Server now exposes proper routes for SageMaker.
- Dependency issue with DeepSparse Server no longer installs an old version of a library that caused crashing issues in some use cases.

0.12.0

New Features:
**Documentation:**
* [SparseServer.UI](https://github.com/neuralmagic/deepsparse/tree/main/examples/sparseserver-ui): a Streamlit app for deploying the DeepSparse Server for exploring the inference performance of BERT on the question answering task.
* [DeepSparse Server README](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/server): `deepsparse.server` capabilities, including single model and multi-model inferencing.
* [Twitter NLP Inference Examples](https://github.com/neuralmagic/deepsparse/tree/main/examples/twitter-nlp) added.

Changes:
**Performance:**
* Speedup for large batch sizes when using sync mode on AMD EPYC processors.
* AVX2 improvements for
* Up to 40% speedup out of the box for dense quantized models.
* Up to 20% speedup for pruned quantized BERT, ResNet-50 and MobileNet.
* Speedup from sparsity realized for ConvInteger operators.
* Model compilation time decreased on systems with many cores.
* Multi-stream Scheduler: certain computations that were executed during runtime are now precomputed.
* Hugging Face Transformers integration updated to latest state from upstream main branch.

**Documentation:**
* [DeepSparse README](https://github.com/neuralmagic/deepsparse): references to `deepsparse.server`, `deepsparse.benchmark`, and Transformer pipelines.
* [DeepSparse Benchmark README](https://github.com/neuralmagic/deepsparse/tree/main/src/deepsparse/benchmark_model): highlights of `deepsparse.benchmark` CLI command.
* [Transformers 🤗 Inference Pipelines](https://github.com/neuralmagic/deepsparse/tree/main/examples/huggingface-transformers): examples included on how to run inference via Python for several NLP tasks.

Resolved Issues:
* When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will no longer disable optimizations and see very poor performance.
* Users executing `arch.bin` now receive a correct architecture profile of their system.

Known Issues:
* When running the DeepSparse engine on a system with a nonuniform system topology, for example, an AMD EPYC processor where some cores per core-complex (CCX) have been disabled, model compilation will never terminate. A workaround is to set the environment variable `NM_SERIAL_UNIT_GENERATION=1`.

0.11.2

This is a patch release for 0.11.0 that contains the following changes:

- Fixed an assertion error that would occur when using `deepsparse.benchmark` on AMD machines with the argument `-pin none`.

Known Issues:
- When running quantized BERT with a sequence length not divisible by 4, the DeepSparse Engine will disable optimizations and see very poor performance.

0.11.1

This is a patch release for 0.11.0 that contains the following changes:

* When running [NanoDet-Plus-m](https://github.com/RangiLyu/nanodet), the DeepSparse Engine will no longer fail with an assertion (See #279).
* The DeepSparse Engine now respects the cpu affinity set by the calling thread. This is essential for the new [Command-line (CLI) tool](https://github.com/neuralmagic/deepsparse/blob/main/examples/amd-azure/README.md) `multi-process-benchmark.py` to function correctly. This script allows users to measure the performance using multiple separate processes in parallel.
* Fixed a performance regression on BERT batch size 1 sequence length 128 models.

Page 4 of 7

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.