New Features:
* Support added for running multiple models with the same engine when using the Elastic Scheduler.
* When using the Elastic Scheduler, the caller can now use the `num_streams` argument to tune the number of requests that are processed in parallel.
* Pipeline and annotation support added and generalized for transformers, yolov5, and torchvision.
* Documentation additions made for transformers, yolov5, torchvision, and serving that focus on model deployment for the given integrations.
* AWS SageMaker example created.
Changes:
* Click as a root dependency added as the new preferred route for CLI invocation and arg management.
Performance:
* Inference performance has been improved for unstructured sparse quantized models on AVX2 and AVX-512 systems that do not support VNNI instructions. This includes up to 20% on BERT and 45% on ResNet-50.
Resolved Issues:
* When a layer operates on a dataset larger than 2GB, potential crashes no longer happen.
* Assertion error addressed for Reduce operations where the reduction axis is of length 1.
* Rare assertion failure addressed related to Tensor Columns.
* When running the DeepSparse Engine on a system with a non-uniform system topology, model compilation now properly terminates.
Known Issues:
* In rare cases, the engine may crash with an assertion failure during model compilation for a convolution with a 1x1 kernel with 2x2 convolution strides; hotfix forthcoming.
* The engine will crash with an assertion failure when setting the `num_streams` parameter to fewer than the number of NUMA nodes; hotfix forthcoming.
* In rare cases, the engine may enter an infinite loop when an operation has multiple inputs coming from the same source; hotfix forthcoming.