New Features:
* DeepSparse Engine optimized for Sparse FP32 BERT.
* Optimized BERT model collection now in the [SparseZoo](https://sparsezoo.neuralmagic.com/?domain=nlp&sub_domain=question_answering&page=1).
* Performance improvement example includes 5x increased throughput on [PruneBERT](https://arxiv.org/abs/2005.07683) (281 seq/sec) compared to dense BERT (53 seq/sec) at batch size 32 and sequence length 128 (AWS c5.12xlarge).
* Optimized Tanh operator support provided.
* Hugging Face transformers pipeline [APIs added for NLP models](https://github.com/neuralmagic/deepsparse/tree/main/examples/huggingface-transformers).
* Hugging Face transformers [examples added for benchmarking, deploying, and sample application](https://docs.neuralmagic.com/main/source/model-pages/nlp-bert.html#sparse-inference).
* Ultralytics YOLOv5 [example support added](https://docs.neuralmagic.com/main/source/model-pages/cv-detection-yolov5.html#sparse-inference).
Changes:
* Performance improvements made for:
- all networks when running on multi-socket machines, especially those with large outputs.
- batched Softmax and Reduce operators with many threads available.
- Reshape operators when multiple dimensions are combined into one or one dimension is split into multiple.
- stacked matrix multiplications by supporting more input layouts.
* YOLOv3 example integration was generalized to ultralytics-yolo in support of both V3 and V5.
Resolved Issues:
* Engine now runs on architectures with more than one NUMA node per socket.
Known Issues:
* None