Redesign Release
**Notice: This release is not backwards compatible**
* Easy to deploy & configure
* Support Machine Learning Models (Scikit Learn, XGBoost, LightGBM)
* Support Deep Learning Models (Tensorflow, PyTorch, ONNX)
* Customizable RestAPI for serving (i.e. allow per model pre/post-processing for easy integration)
* Flexible
* On-line model deployment
* On-line endpoint model/version deployment (i.e. no need to take the service down)
* Per model standalone preprocessing and postprocessing python code
* Scalable
* Multi model per container
* Multi models per serving service
* Multi-service support (fully separated multiple serving service running independently)
* Multi cluster support
* Out-of-the-box node auto-scaling based on load/usage
* Efficient
* multi-container resource utilization
* Support for CPU & GPU nodes
* Auto-batching for DL models
* Automatic deployment
* Automatic model upgrades w/ canary support
* Programmable API for model deployment
* Canary A/B deployment
* Online Canary updates
* Model Monitoring
* Usage Metric reporting
* Metric Dashboard
* Model performance metric
* Model performance Dashboard
Features:
- [x] FastAPI integration for inference service
- [x] multi-process Gunicorn for inference service
- [x] Dynamic preprocess python code loading (no need for container/process restart)
- [x] Model files download/caching (http/s3/gs/azure)
- [x] Scikit-learn. XGBoost, LightGBM integration
- [x] Custom inference, including dynamic code loading
- [x] Manual model upload/registration to model repository (http/s3/gs/azure)
- [x] Canary load balancing
- [x] Auto model endpoint deployment based on model repository state
- [x] Machine/Node health metrics
- [x] Dynamic online configuration
- [x] CLI configuration tool
- [x] Nvidia Triton integration
- [x] GZip request compression
- [x] TorchServe engine integration
- [x] Prebuilt Docker containers (dockerhub)
- [x] Docker-compose deployment (CPU/GPU)
- [x] Scikit-Learn example
- [x] XGBoost example
- [x] LightGBM example
- [x] PyTorch example
- [x] TensorFlow/Keras example
- [x] Model ensemble example
- [x] Model pipeline example
- [x] Statistics Service
- [x] Kafka install instructions
- [x] Prometheus install instructions
- [x] Grafana install instructions