New Features
- The Cognitive Services on Spark: A simple and scalable integration between the Microsoft Cognitive Services and SparkML
- Bing Image Search
- Computer Vision: OCR, Recognize Text, Recognize Domain Specific Content,
Analyze Image, Generate Thumbnails
- Text Analytics: Language Detector, Entity Detector, Key Phrase Extractor,
Sentiment Detector, Named Entity Recognition
- Face: Detect, Find Similar, Identify, Group, Verify
- Added distributed model interpretability with LIME on Spark
- **100x** lower latencies (\<1ms) with Spark Serving
- Expanded Spark Serving to cover the full HTTP protocol
- Added the `SuperpixelTransformer` for segmenting images
- Added a Fluent API, `mlTransform` and `mlFit`, for composing pipelines more elegantly
New Examples
- Chain together cognitive services to understand the feelings of your favorite celebrities with `CognitiveServices - Celebrity Quote Analysis.ipynb`
- Explore how you can use Bing Image Search and Distributed Model Interpretability to get an Object Detection system without labeling any data in `ModelInterpretation - Snow Leopard Detection.ipynb`
- See how to deploy *any* spark computation as a Web service on *any* Spark platform with the `SparkServing - Deploying a Classifier.ipynb` notebook
Updates and Improvements
LightGBM
- More APIs for loading LightGBM Native Models
- LightGBM training checkpointing and continuation
- Added tweedie variance power to LightGBM
- Added early stopping to lightGBM
- Added feature importances to LightGBM
- Added a PMML exporter for LightGBM on Spark
HTTP on Spark
- Added the `VectorizableParam` for creating column parameterizable inputs
- Added `handler` parameter added to HTTP services
- HTTP on Spark now propagates nulls robustly
Version Bumps
- Updated to Spark 2.3.1
- LightGBM version update to 2.1.250
Misc
- Added Vagrantfile for easy windows developer setup
- Improved Image Reader fault tolerance
- Reorganized Examples into Topics
- Generalized Image Featurizer and other Image based code to handle Binary Files as well as Spark Images
- Added `ModelDownloader` R wrapper
- Added `getBestModel` and `getBestModelInfo` to `TuneHyperparameters`
- Expanded Binary File Reading APIs
- Added `Explode` and `Lambda` transformers
- Added `SparkBindings` trait for automating spark binding creation
- Added retries and timeouts to `ModelDownloader`
- Added `ResizeImageTransformer` to remove `ImageFeaturizer` dependence on OpenCV
Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark. (In alphabetical order)
- Abhiram Eswaran, Anand Raman, Ari Green, Arvind Krishnaa Jagannathan, Ben Brodsky, Casey Hong, Courtney Cochrane, Henrik Frystyk Nielsen, Ilya Matiach, Janhavi Suresh Mahajan, Jaya Susan Mathew, Karthik Rajendran, Mario Inchiosa, Minsoo Thigpen, Soundar Srinivasan, Sudarshan Raghunathan, terrytangyuan