Torchserve

Latest version: v0.12.0

Safety actively analyzes 702662 Python packages for vulnerabilities to keep your Python projects secure.

Page 4 of 7

0.10.0

Not secure

+ **Kubernetes HPA support** - Added [support](https://github.com/pytorch/serve/issues/714) for Kubernetes HPA.
+ **Faster transformer example** - Added [example](https://github.com/pytorch/serve/tree/master/examples/FasterTransformer_HuggingFace_Bert) for Faster transformer for optimized transformer model inference.
+ **(experimental) torchprep support** - Added experimental [CLI tool](https://github.com/pytorch/serve/tree/master/experimental/torchprep) to prepare Pytorch models for efficient inference.
+ **Custom metrics example** - Added [example](https://github.com/pytorch/serve/tree/master/examples/custom_metrics) for custom metrics with mtail metrics exporter and Prometheus.
+ **Reactjs example for Image Classifier** - Added [example](https://github.com/pytorch/serve/issues/1109) for Reactjs Image Classifier.

Improvements
+ **Batching inference exception support** - Optimized [batching](https://github.com/pytorch/serve/pull/1272/files) to fix a concurrent modification exception that was occurring with batch inference.
+ **k8s cluster creation support upgrade** - Updated Kubernetes cluster creation scripts for v1.17 [support](https://github.com/pytorch/serve/issues/580).
+ **Nvidia devices visibility support** - Added [support](https://github.com/pytorch/serve/blob/master/docs/configuration.md#nvidia-control-visibility) for NVIDIA devices visibility.
+ **Large image support** - Added [support](https://github.com/pytorch/serve/pull/1271) for PIL.Image.MAX_IMAGE_PIXELS.
+ **Custom HTTP status support** - Added [support](https://github.com/pytorch/serve/blob/master/docs/custom_service.md#returning-custom-error-codes) to return custom http status from a model handler.
+ **TS_CONFIG_FILE env var support** - Added [support](https://github.com/pytorch/serve/issues/1257) for setting `TS_CONFIG_FILE` as env var.
+ **Frontend build optimization** - Optimized [frontend](https://github.com/pytorch/serve/issues/1306) to reduce build times by 3.7x.
+ **Warmup in benchmark** - Added [support](https://github.com/pytorch/serve/issues/1183) for warmup in benchmark scripts.

Platform Support

0.9.0

Not secure

+ **Model configuration support** - Added [support](https://github.com/pytorch/serve/blob/master/docs/configuration.md#config-model) for [model performance tuning on SageMaker](https://github.com/lxning/torchserve_perf/blob/master/torchserve_perf.ipynb) via model configuration in config.properties.
+ **Serialize config snapshots to DynamoDB** - Added [support](https://github.com/pytorch/serve/blob/master/plugins/docs/ddb_endpoint.md) for serializing config snapshots to DDB.
+ **Prometheus metrics plugin support** - Added [support](https://github.com/pytorch/serve/issues/611) for Prometheus metrics plugin.
+ **Kubeflow Pipelines support** - Added support for Kubeflow pipelines and Google Vertex AI Manages pipelines, see examples [here](https://github.com/kubeflow/pipelines/tree/master/samples/contrib/pytorch-samples)
+ **KFServing docker support** - Added [production docker](https://github.com/pytorch/serve/issues/1090) for KFServing.
+ **Python 3.9 support** - TorchServe is now certified working with Python 3.9.

Improvements
+ **HF BERT models multiple GPU support** - Added multi-gpu [support](https://github.com/pytorch/serve/pull/1141/files) for HuggingFace BERT models.
+ **Error log for customer python package installation** - Added [support](https://github.com/pytorch/serve/issues/1086) to log error of customer python package installation.
+ **Workflow documentation optimization** - [Optimized](https://github.com/pytorch/serve/pull/1154) workflow documentation.

Tooling improvements
+ **Mar file automation integration** - Integrated [mar file generation automation](https://github.com/pytorch/serve/pull/1140) into pytest and postman test.
+ **Benchmark automation for AWS neuron support** - Added [support](https://github.com/pytorch/serve/pull/1099) for AWS neuron benchmark automation.
+ **Staging binary build support** - Added [support](https://github.com/pytorch/serve/pull/1129) for staging binary build.

Platform Support

0.8.2

Not secure

This is the release of TorchServe v0.8.2.

Security
+ Updated snakeyaml version to v2 2523 nskool
+ Added warning about model allowed urls when default value is applied 2534 namannandan

Custom metrics backwards compatibility
+ `add_metric` is now backwards compatible with versions [< v0.6.1] but the default metric type is inferred to be `COUNTER`. If the metric is of a different type, it will need to be specified in the call to `add_metric` as follows:
`metrics.add_metric(name='GenericMetric', value=10, unit='count', dimensions=[...], metric_type=MetricTypes.GAUGE)`
+ When upgrading from versions [v0.6.1 - v0.8.1] to v0.8.2, replace the call to `add_metric` with `add_metric_to_cache`.
+ All custom metrics updated in the custom handler will need to be included in the metrics configuration file for them to be emitted by Torchserve. This is shown [here](https://github.com/pytorch/serve/blob/58eb2d2c79cf1cf711abf9ffea5678420a5ff65a/docs/metrics.md#central-metrics-yaml-file-definition).
+ A detailed [upgrade guide](https://github.com/pytorch/serve/blob/04e0b37dafbd9f98a60d040bbc36f64016fc2c8d/docs/metrics.md#backwards-compatibility-warnings-and-upgrade-guide) is included in the [metrics documentation](https://github.com/pytorch/serve/blob/04e0b37dafbd9f98a60d040bbc36f64016fc2c8d/docs/metrics.md).

New Features
+ Supported KServe GPRC v2 2176 jagadeeshi2i
+ Supported K8S session affinity 2519 jagadeeshi2i

New Examples
1. Example LLama v2 70B chat using HuggingFace Accelerate 2494 lxning HamidShojanazeri agunapal

2. large model example OPT-6.7B on Inferentia2 2399 namannandan
- This example demonstrates how NeuronX compiles the model , detects neuron core availability and runs the inference.

3. DeepSpeed deferred init with OPT-30B 2419 agunapal
- This PR added feature `deferred model init` in [OPT-30B example](https://github.com/pytorch/serve/tree/master/examples/large_models/deepspeed) by leveraging DeepSpeed new version. This feature is able to significantly reduce model loading latency.

4. Torch TensorRT example 2483 agunapal
- This PR uses Resnet-50 as an example to demonstrate Torch TensorRT.

5. K8S mnist example using minikube 2323 agunapal
- This example shows how to use a pre-trained custom MNIST model to performing real time Digit recognition via K8S.

6. Example for custom metrics 2516 namannandan

7. Example for object detection with ultralytics YOLO v8 model 2508 agunapal

Improvements
+ Migrated publishing torchserve-plugins-sdk from Maven JCenter to Maven Central 2429 2422 namannandan
+ Fixed download model from S3 presigned URL 2416 namannandan
+ Enabled opt-6.7b benchmark on inf2 2400 namannandan
+ Added job Queue Status in describe API 2464 namannandan
+ Added add_metric API to be backward compatible 2525 namannandan
+ Upgraded nvidia base image version to `nvidia/cuda:11.7.1-base-ubuntu20.04` in GPU docker image 2442 agunapal
+ Added Docker regression tests in CI 2403 agunapal
+ Updated release version 2533 agunapal
+ Upgraded default cuda to 11.8 in docker image build 2489 agunapal
+ Updated docker nightly build parameters 2493 agunapal
+ Added path to save ab benchmark profile graph in benchmark report 2451 agunapal
+ Added profile information for benchmark 2470 agunapal
+ Fixed manifest null in base handler 2488 pedrogengo
+ Fixed batching input in DALI example 2455 jagadeeshi2i
+ Fixed metrcis for K8S setup 2473 jagadeeshi2i
+ Fixed kserve storage optional package in Dockerfile 2537 jagadeeshi2i
+ Fixed typo in ModelConfig.java comments 2506 arnavmehta7
+ Fixed netty direct buffer issues in torchserve-plugins-sdk 2511 marrodion
+ Fixed typo in ts/context.py comments 2536 ethankim00
+ Fixed Server error when gRPC client close connection unexpectedly 2420 lxning

Documentation
+ Updated large model documentation 2468 sekyondaMeta
+ Updated Sphinx landing page and requirements 2428 2520 sekyondaMeta
+ Updated G analytics in docs 2449 sekyondaMeta
+ Added performance checklist in docs 2526 sekyondaMeta
+ Added performance guidance in FAQ 2524 sekyondaMeta
+ Added instruction for embedding handler examples 2431 sidharthrajaram
+ Updated PyPi description 2445 bryanwweber agunapal
+ Updated Better Transformer README 2474 HamidShojanazeri
+ Fixed typo in microbatching README 2484 InakiRaba91
+ Fixed broken link in kubernetes AKS README 2490 agunapal
+ Fixed lint error 2497 ankithagunapal
+ Updated instructions for building GPU docker image for ONNX 2435 agunapal

Platform Support
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

0.8.1

Not secure

Improvements
+ **Fixed GPU memory high usage issue and updated model zoo** - Fixed [duplicate process on GPU device ](https://github.com/pytorch/serve/issues/1037).
+ **gRPC max_request_size support**- Added support for gRPC max_request_size configuration in config.properties.
+ **Non SSL request support** - Added [support for non SSL request](https://github.com/pytorch/serve/issues/202).
+ **Benchmark automation support** - Added [support](https://github.com/pytorch/serve/blob/release_0.4.0/docs/management_api.md#register-a-model) for benchmark automation.
+ **Support mar file generation automation** - Added [mar file generation automation](https://github.com/pytorch/serve/issues/1050).

Community Contributions
+ **Fairseq NMT example** - Added [Fairseq Neural Machine Translation example](https://github.com/pytorch/serve/blob/master/examples/nmt_transformer/README.md) (contributed by AshwinChafale)
+ **DeepLabV3 Image Segmentation example** - Added [DeepLabV3 Image Segmentation example](https://github.com/pytorch/serve/blob/master/examples/image_segmenter/deeplabv3/README.md) (contributed by alvarobartt)

Bug Fixes
+ **Huggingface_Transformers model example** - Fixed [Captum explanations fails with HF models](https://github.com/pytorch/serve/issues/934).

Platform Support

0.8.0

Not secure

This is the release of TorchServe v0.8.0.

New Features

1. **Supported [large model inference](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/docs/large_model_inference.md?plain=1#L1) in distributed environment 2193 2320 2209 2215 2310 2218 lxning HamidShojanazeri**

TorchServe added the deep integration to support large model inference. It provides PyTorch native large model inference solution by integrating [PiPPy](https://github.com/pytorch/tau/tree/main/pippy). It also provides the flexibility and extensibility to support other popular libraries such as Microsoft Deepspeed, and HuggingFace Accelerate.

2. **Supported streaming response for GRPC 2186 and HTTP 2233 lxning**

To improve UX in Generative AI inference, TorchServe allows for sending intermediate token response to client side by supporting [GRPC server side streaming](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/docs/grpc_api.md?plain=1#L74) and [HTTP 1.1 chunked encoding ](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/docs/inference_api.md?plain=1#L103).

3. **Supported PyTorch/XLA on GPU and TPU 2182 morgandu**

By leveraging `torch.compile` it's now possible to run torchserve using XLA which is optimized for both GPU and TPU deployments.

4. **Implemented [New Metrics platform](https://github.com/pytorch/serve/issues/1492) #2199 2190 2165 namannandan lxning**

TorchServe fully supports [metrics](https://github.com/pytorch/serve/blob/master/docs/metrics.md#introduction) in Prometheus mode or Log mode. Both frontend and backend metrics can be configured in a [central metrics YAML file](https://github.com/pytorch/serve/blob/master/docs/metrics.md#central-metrics-yaml-file-definition).

5. **Supported map based model config YAML file. 2193 lxning**

Added [config-file](https://github.com/pytorch/serve/blob/2f1f52f553e83703b5c380c2570a36708ee5cafa/model-archiver/README.md?plain=1#L119) option for model config to model archiver tool. Users is able to flexibly define customized parameters in this YAML file, and easily access them in backend handler via variable [context.model_yaml_config](https://github.com/pytorch/serve/blob/2f1f52f553e83703b5c380c2570a36708ee5cafa/docs/configuration.md?plain=1#L267). This new feature also made TorchServe easily support the other new features and enhancements.

6. **Refactored PT2.0 support 2222 msaroufim**

We've refactored our model optimization utilities, improved logging to help debug compilation issues. We've also now deprecated `compile.json` in favor of using the new YAML config format, follow our guide here to learn more https://github.com/pytorch/serve/blob/master/examples/pt2/README.md the main difference is while archiving a model instead of passing in `compile.json` via `--extra-files` we can pass in a `--config-file model_config.yaml`

7. **Supported user specified gpu deviceIds for a model 2193 lxning**

By default, TorchServe uses a round-robin algorithm to assign GPUs to a worker on a host. Starting from v0.8.0, TorchServe allows users to define [deviceIds](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/model-archiver/README.md?plain=1#L175) in the [model_config.yaml.](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/model-archiver/README.md?plain=1#L162) to assign GPUs to a model.

8. **Supported cpu model on a GPU host 2193 lxning**

TorchServe supports hybrid mode on a GPU host. Users are able to define [deviceType](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/model-archiver/README.md?plain=1#L174) in model config YAML file to deploy a model on CPU of a GPU host.

9. **Supported Client Timeout 2267 lxning**

TorchServe allows users to define [clientTimeoutInMills](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/frontend/archive/src/main/java/org/pytorch/serve/archive/model/ModelConfig.java#L40) in a model config YAML file. TorchServe calculates the expired timestamp of an incoming inference request if [clientTimeoutInMills](https://github.com/pytorch/serve/blob/614bfc0a382d809d5fd59d0dbc130c57e67c3332/frontend/archive/src/main/java/org/pytorch/serve/archive/model/ModelConfig.java#L40) is set, and drops the request once it is expired.

10. **Updated ping endpoint default behavior 2254 lxning**

Supported [maxRetryTimeoutInSec](https://github.com/pytorch/serve/blob/2f1f52f553e83703b5c380c2570a36708ee5cafa/frontend/archive/src/main/java/org/pytorch/serve/archive/model/ModelConfig.java#L35), which defines the max maximum time window of recovering a dead backend worker of a model, in model config YAML file. The default value is 5 min. Users are able to adjust it in model config YAML file. The [ping endpoint](https://github.com/pytorch/serve/blob/2f1f52f553e83703b5c380c2570a36708ee5cafa/docs/inference_api.md?plain=1#L26) returns 200 if all models have enough healthy workers (ie, equal or larger the minWorkers); otherwise returns 500.

New Examples

+ **[Example of Pippy](https://github.com/pytorch/serve/tree/master/examples/large_models/Huggingface_pippy) onboarding Open platform framework for distributed model inference #2215 HamidShojanazeri**

+ **[Example of DeepSpeed](https://github.com/pytorch/serve/tree/master/examples/large_models/deepspeed/opt) onboarding Open platform framework for distributed model inference #2218 lxning**

+ **[Example of Stable diffusion v2](https://github.com/pytorch/serve/tree/master/examples/diffusers) #2009 jagadeeshi2i**

Improvements
+ Upgraded to PyTorch 2.0 2194 agunapal

+ Enabled Core pinning in CPU nightly benchmark 2166 2237 min-jean-cho

TorchServe can be used with [Intel® Extension for PyTorch*](https://github.com/intel/intel-extension-for-pytorch) to give performance boost on Intel hardware. Intel® Extension for PyTorch* is a Python package extending PyTorch with up-to-date features optimizations that take advantage of AVX-512 Vector Neural Network Instructions (AVX512 VNNI), Intel® Advanced Matrix Extensions (Intel® AMX), and more.

![dashboard](https://github.com/pytorch/serve/assets/93151422/164b5d16-4edc-4575-b0b4-e2b2306076c3)

Enabling core pinning in TorchServe CPU nightly benchmark shows significant performance speedup. This feature is implemented via a script under PyTorch Xeon backend, initiated from Intel® Extension for PyTorch*. To try out core pinning on your workload, add `cpu_launcher_enable=true` in `config.properties`.

To try out more optimizations with Intel® Extension for PyTorch*, install Intel® Extension for PyTorch* and add `ipex_enable=true` in `config.properties`.

+ Added Neuron nightly benchmark dashboard 2171 2167 namannandan
+ Enabled torch.compile support for torch 2.0.0 pre-release 2256 morgandu
+ Fixed torch.compile mac regression test 2250 msaroufim
+ Added configuration option to disable system metrics 2104 namannandan
+ Added regression test cases for SageMaker MME contract 2200 agunapal

In case of OOM , return error code 507 instead of generic code 503

+ Fixed Error thrown in KServe while loading multi-models 2235 jagadeeshi2i
+ Added Docker CI for TorchServe 2226 fabridamicelli
+ Change docker image release from dev to production 2227 agunapal

+ Supported building docker images with specified Python version 2154 agunapal
+ Model archiver optimizations:

a). Added wildcard file search in model archiver --extra-file 2142 gustavhartz
b). Added zip-store option to model archiver tool 2196 mreso
c). Made model archiver tests runnable from any directory 2191 mreso
d). Supported tgz format model decompression in TorchServe frontend 2214 lxning

+ Enabled batch processing in example scripted tokenizer 2130 mreso
+ Made handler tests callable with pytest 2173 mreso
+ Refactored sanity tests 2219 mreso
+ Improved benchmark tool 2228 and added auto-validation 2144 2157 agunapal

Automatically flag deviation of metrics from the average of last 30 runs

+ Added notification for CI jobs' (benchmark, regression test) failure agunapal
+ Updated CI to run on ubuntu 20.04 2153 agunapal
+ Added github code scanning codeql.yml 2149 msaroufim
+ freeze pynvml version to avoid crash in nvgpu 2138 mreso
+ Made pre-commit usage clearer in error message 2241 and upgraded isort version 2132 msaroufim

Dependency Upgrades

Documentation
+ Nvidia MPS integration study 2205 mreso

This study compares TPS b/w TorchServe with Nvidia MPS enabled and TorchServe without Nvidia MPS enabled on P3 and G4. It can help to the decision in enabling MPS for your deployment or not.

+ Updated TorchServe page on pytorch.org 2243 agunapal
+ Lint fixed broken windows Conda link 2240 msaroufim
+ Corrected example PT2 doc 2244 samils7
+ Fixed regex error in Configuration.md 2172 mpoemsl
+ Fixed dead Kubectl links 2160 msaroufim
+ Updated model file docs in example doc 2148 tmc
+ Example for serving TorchServe using docker 2118 agunapal
+ Updated walmart blog link 2117 agunapal

Platform Support
Ubuntu 16.04, Ubuntu 18.04, Ubuntu 20.04 MacOS 10.14+, Windows 10 Pro, Windows Server 2019, Windows subsystem for Linux (Windows Server 2019, WSLv1, Ubuntu 18.0.4). TorchServe now requires Python 3.8 and above, and JDK17.

GPU Support

0.7.2

+ **TorchServe Profiling** - Added end-to-end profiling of inference requests. The time taken for different events by TorchServe for an inference request is captured in TorchServe metrics logs
+ **Serving SDK** - Release TorchServe Serving SDK 0.4.0 on maven with contracts/interfaces for Metric Endpoint plugin and Snapshot plugins
+ **Naked DIR support** - Added support for Model Archives as Naked DIRs with the `--archive-format no-archive`
+ **Local file URL support** - Added support for registering model through local file (`file:///`) URLs
+ **Install dependencies** - Added a more robust install dependency script certified across different OS platforms (Ubuntu 18.04, MacOS, Windows 10 Pro, Windows Server 2019)
+ **Link Checker** - Added link checker in sanity script to report any broken links in documentation
+ **Enhanced model description** - Added GPU usage info and worker PID in model description
+ **FAQ guides** - Added most [frequently asked questions](https://github.com/pytorch/serve/blob/master/docs/FAQs.md) by community users
+ **Troubleshooting guide** - Added documentation for [troubleshooting common problems](https://github.com/pytorch/serve/blob/master/docs/Troubleshooting.md) related to model serving by TorchServe
+ **Use case guide** - Provides the reference [use cases](https://github.com/pytorch/serve/blob/master/docs/use_cases.md) i.e. different ways in which TorchServe can be deployed for serving different types of PyTorch models

Page 4 of 7

Releases

Has known vulnerabilities

Previous Next

Torchserve

Page 4 of 7

0.10.0

0.9.0

0.8.2

0.8.1

0.8.0

0.7.2

Page 4 of 7

Links

Releases