New Features:
* ONNX evaluation pipeline for OpenPifPaf (915)
* YOLOv8 segmentation pipelines and validation (924)
* `deepsparse.benchmark_sweep` CLI to enable sweeps of benchmarks across different settings such as cores and batch sizes (860)
* `Engine.generate_random_inputs()` API (966)
* Example data logging configurations for pipelines/server (867)
* Expanded built-in functions for NLP and CV pipeline logging to enable better monitoring (865) (862)
* Product usage analytics tracking in DeepSparse Community edition ([documentation](https://docs.neuralmagic.com/products/deepsparse/community#product-usage-analytics))
Performance Improvements:
* Inference latency for unstructured sparse-quantized CNNs has been improved by up to 2x.
* Inference throughput and latency for dense CNNs has been improved by up to 20%.
* Inference throughput and latency for dense transformers has been improved by up to 30%.
* The following operators are now supported for performance:
* Neg, Unsqueeze with non-constant inputs
* MatMulInteger with two non-constant inputs
* GEMM with constant weights and 4D or 5D inputs
Changes:
* Transformers and YOLOv5 integrations migrated from auto install to install from PyPI packages. Going forward, `pip install deepsparse[transformers]` and `pip install deepsparse[yolov5]` will need to be used.
* DeepSparse now uses hwloc to determine CPU topology. This fixes a bug where DeepSparse could not be used performantly inside of a Kubernetes cluster with a static CPU manager policy.
* When users pass in a `num_streams` parameter that is smaller than the number of cores, multi-stream and elastic scheduler behaviors have been improved. Previously, DeepSparse would divide the system into `num_streams` chunks and fill each chunk until it ran out of threads. Now, each stream will use a number of threads equal to `num_cores` divided by `num_streams`, with the remainder distributed in a round-robin fashion.
Resolved Issues:
* In networks with a Clip operator where min isn't equal to zero, performance bugs no longer occurs.
* Crashing eliminated:
* Pipeline conll eval using `ignore_labels`. (903)
* YOLOv8 pipelines handling models with dynamic inputs. (967)
* QA pipelines with sequence lengths equal to or less than 128. (889)
* Image classification pipelines handling PNG images. (870)
* ONNX overriding of shapes if a list was not passed in; this now automatically wraps in a list. (914)
* Assertion errors/failures removed:
* Networks with both Convolutions and GEMM operations.
* YOLOv8 model compilation.
* Slice and Unsqueeze operators with a negative axis.
* OPT models involving a constant tensor that is broadcast in two different ways.
Known Issues:
* None