Deepsparse

Latest version: v1.8.0

Safety actively analyzes 681866 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 7 of 7

0.3.0

New Features:

* Multi-stream scheduler added as a configurable option to the engine.

Changes:

* Errors related to setting the NUMA memory policy are now issued as warnings.
* Improved compilation times for sparse networks.
* Performance improvements made for: networks with large outputs and multi-socket machines; ResNet-50 v1 quantized and kernel sparsity gemms.
* Copy operations and placement of quantization operations within network optimized.
* Version changed to be loaded from version.py file, default build on branches is now nightly.
* cpu.py file and related APIs added to DeepSparse repo instead of copying over from backend.
* Add unsupported system install errors for end users when running on non-Linux systems.
* YOLOv3 batch 64 quantized now has a speedup of 16% in the DeepSparse Engine.

Resolved Issues:

* An assertion is no longer triggered when more sockets or threads than available are requested.
* Resolved assertion when performing Concat operations on constant buffers.
* Engine no longer crashes when the output of a QLinearMatMul operation has a dimension not divisible by 4.
* The engine now starts without crashing on Windows Subsystem for Linux and Docker for Windows or Docker for Mac.

Known Issues:

* None

0.2.0

New Features:

* None

Changes:

* Dense convolutions on AVX2 systems were optimized, improving performance for many non-pruned networks. In particular, this results in a speed improvement for batch size 64 ResNet-50 of up to 28% on Intel AVX2 systems and up to 39% on AMD AVX2 systems.
* Operations to shuffle activations in engine optimized, resulting in up to 14% speed improvement for batch size 64 pruned quantized MobileNetV1.
* Performance improvements made for networks with large output arrays.

Resolved Issues:

* Engine no longer fails with an assert when running some quantized networks.
* Some Resize operators were not optimized if they had a ROI input.
* Memory leak addressed on multi-socket systems when batch size > 1.
* Docs and readme corrections made for minor issues and broken links.
* Makefile no longer deletes files for docs compilation and cleaning.

Known Issues:

* In rare cases where a tensor, used as the input or output to an operation, is larger than 2GB, the engine can segfault. Users should decrease the batch size as a workaround.
* In some cases, models running complicated pre- or post-processing steps could diminish the DeepSparse Engine performance by up to a factor of 10x due to hyperthreading, as two engine threads can run on the same physical core. Address the performance issue by trying the following recommended solutions in order of preference:

1. [Enable thread binding](https://docs.neuralmagic.com/deepsparse/debugging-optimizing/diagnostics-debugging.html#performance-tuning)

If that does not give performance benefit or you want to try additional options:

2. [Use the **numactl** utility](https://docs.neuralmagic.com/deepsparse/debugging-optimizing/numactl-utility.html) to prevent the process from running on hyperthreads.

3. Manually set the thread affinity in Python as follows:

import os
from deepsparse.cpu import cpu_architecture
ARCH = cpu_architecture()

if ARCH.vendor == "GenuineIntel":
os.sched_setaffinity(0, range(ARCH.num_physical_cores()))
elif ARCH.vendor == "AuthenticAMD":
os.sched_setaffinity(0, range(0, 2*ARCH.num_physical_cores(), 2))
else:
raise RuntimeError(f"Unknown CPU vendor {ARCH.vendor}")

0.1.1

This is a patch release for 0.1.0 that contains the following changes:

- Docs updates: tagline, overview, update to use sparsification for verbiage
- Examples updated to use new ResNet-50 pruned_quant moderate model from the SparseZoo
- Nightly build dependencies now match on major.minor and not full version
- Benchmarking script added for reproducing ResNet-50 numbers
- Small (3-5%) performance improvement for pruned quantized ResNet-50 models, for batch size greater than 16
- Reduced memory footprint for networks with sparse fully connected layers
- Improved performance on multi-socket systems when batch size is larger than 1

0.1.0

Welcome to our initial release on GitHub! Older release notes can be [found here](https://neuralmagic.com/blog/release-notes/).

New Features:

* Operator support enabled:
* QLinearAdd
* 2D QLinearMatMul when the second operand is constant
* Multi-stream support added for concurrent requests.
* Examples for benchmarking, classification flows, detection flows, and Flask servers added.
* Jupyter Notebooks for classification and detection flows added.
* MakeFile flows and utilities implemented for GitHub repo structure.

Changes:

* Software packaging updated to reflect new GitHub distribution channel, from file naming conventions to license enforcement removal.
* Initial startup message updated with improved language.
* Distribution now manylinux2014 compliant; support for Ubuntu 16.04 deprecated.
* QuantizeLinear operations now use division instead of scaling by reciprocal for small quantization scales.
* Small performance improvements made on some quantized networks with nontrivial activation zero points.

Resolved Issues:

* Networks with sparse quantized convolutions and nontrivial activation zero points now have consistent correct results.
* Crash no longer occurs for some models where a quantized depthwise convolution follows a non-depthwise quantized convolution.

Known Issues:

* None

Page 7 of 7

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.