Synapseml

Latest version: v1.0.10

Safety actively analyzes 709073 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 5 of 5

0.9.5

Not secure
<img width="100%" src="https://mmlspark.blob.core.windows.net/graphics/emails/email_header_synapseml.jpg" alt="SynapseML" href="https://github.com/Azure/mmlspark">

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML v0.9.5 (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.



Highlights

| <img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/azure_maps.svg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/anomaly_detector.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/scales.svg"> | <img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/tts.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/healcare.svg"> |
|:--:|:--:|:--:|:--:|:--:|
| **Geospatial Intelligence** |**Multivariate Anomaly Detection** | **Responsible AI at Scale** | **Text To Speech** | **Healthcare Analytics** |
| Large-scale map and geocoding operations | Build custom time series anomaly detection systems | Distributed Conditional Expectation and Partial Dependence Analysis | East-to-use Neural Text to Speech for large datasets | Quickly understand entities and relationships in corpora of medical text. |

New Features

Geospatial Intelligence 🗺️
- Added support for distributed geospatial queries backed by the [Azure Maps API](https://azure.microsoft.com/en-us/services/azure-maps/)
- Added the [geospatial usage overview](https://microsoft.github.io/SynapseML/docs/features/geospatial_services/GeospatialServices%20-%20Overview/) ([#1339](https://github.com/Microsoft/SynapseML/issues/1339))
- Explore how to use the geospatial intelligence services to [analyze flood risks](https://microsoft.github.io/SynapseML/docs/features/cognitive_services/GeospatialServices%20-%20Flooding%20Risk/). ([#1339](https://github.com/Microsoft/SynapseML/issues/1339))
- Added the `AddressGeocoder` transformer to map informal addresses to standardized adresses with latitude and longitude ([1294](https://github.com/Microsoft/SynapseML/issues/1294))
- Added the `ReverseGeocoder` transformer to map latitude and longitude measurements to standardized addresses. ([1339](https://github.com/Microsoft/SynapseML/issues/1339))
- Added the `CheckPointInPolygon`, to detect if latitude and longitude queries lie inside regions of interest ([1339](https://github.com/Microsoft/SynapseML/issues/1339))


Azure Cognitive Services for Big Data 🧠
- Added the Healthcare Analytics Transformer for extracting medical information, entities, and relationships for text. [[Example Usage]](https://microsoft.github.io/SynapseML/docs/features/cognitive_services/CognitiveServices%20-%20Overview/#healthcare-analytics-sample) ([1329](https://github.com/Microsoft/SynapseML/issues/1329))
- Added the `FitMultivariateAnomaly` estimator for training custom anomaly detection models on DataFrames of multivariate time series data ([1272](https://github.com/Microsoft/SynapseML/issues/1272))
- Added [example notebook](https://microsoft.github.io/SynapseML/docs/next/features/cognitive_services/CognitiveServices%20-%20Multivariate%20Anomaly%20Detection/) for Multivariate Anomaly Detector
- See how to train a custom Multivariate Anomaly detector in the [Estimators reference docs](https://microsoft.github.io/SynapseML/docs/documentation/estimators/estimators_cognitive/#fitmultivariateanomaly) ([1323](https://github.com/Microsoft/SynapseML/issues/1323))
- Added simplified Text Analytics transformers that support auto-batching ([1329](https://github.com/Microsoft/SynapseML/issues/1329))
- Added the `TextToSpeech` Transformer for transforming Dataframes of text to audio files with neural voice synthesis ([1320](https://github.com/Microsoft/SynapseML/issues/1320))
- Added the `TextAnalyze` transformer to support executing multiple text analytics workloads within a single API call ([1267](https://github.com/Microsoft/SynapseML/issues/1267), [#1312](https://github.com/Microsoft/SynapseML/issues/1312))

Responsible AI at Scale 😇
- Added Individual Conditional Expectation explanations and Partial Dependence Plots with the `ICETransformer`. This tool gives detailed explanations of how features in opaque-box models affect the model prediction. ([1284](https://github.com/Microsoft/SynapseML/issues/1284))
- Learn about how to use the ICETransformer through [an example with the Adult Census dataset](https://microsoft.github.io/SynapseML/docs/next/features/responsible_ai/Interpretability%20-%20PDP%20and%20ICE%20explainer/)


MLFlow 🔃
- Add [MLFlow](https://mlflow.org/) support for saving and loading SynapseML models ([#1277](https://github.com/Microsoft/SynapseML/issues/1277))

LightGBM on Spark 🌳
- Improved LightGBM training performance 4x-10x by setting num_threads to be cores-1 ([1282](https://github.com/Microsoft/SynapseML/issues/1282))
- Added the predict_disable_shape_check in LightGBM ([1273](https://github.com/Microsoft/SynapseML/issues/1273))
- Reduced temporary file bloat by creating the LightGBM native temp directory lazily ([1326](https://github.com/Microsoft/SynapseML/issues/1326))
- Added logging for number of columns and rows when creating datasets, set useSingleDatasetMode=True by default ([1222](https://github.com/Microsoft/SynapseML/issues/1222))

Infrastructure 🏭

- SynapseML now installable from Maven Central!
- SynapseML now supports spark v3.2.x

Additional Updates

Bug Fixes 🐞
- Allowed FlattenBatch to propagate non-array values ([1286](https://github.com/Microsoft/SynapseML/issues/1286))
- Fixed flaky tests ([1342](https://github.com/Microsoft/SynapseML/issues/1342))
- Fixed website bugs and migrated docSearch ([1331](https://github.com/Microsoft/SynapseML/issues/1331))
- Fixed issue where IsolationForestModel does not properly exchange params with the inner model ([1330](https://github.com/Microsoft/SynapseML/issues/1330))
- Corrected the objective param when using fobj ([1292](https://github.com/Microsoft/SynapseML/issues/1292))
- Fixed issue where broadcasted sum in breeze 1.0 breaks in Spark 3.2.0 ([1299](https://github.com/Microsoft/SynapseML/issues/1299))
- Hotfixes for R test runners ([1283](https://github.com/Microsoft/SynapseML/issues/1283))
- fix installation instruction ([1268](https://github.com/Microsoft/SynapseML/issues/1268))
- Removing broadcast hint ([1255](https://github.com/Microsoft/SynapseML/issues/1255))
- fix install instructions ([1259](https://github.com/Microsoft/SynapseML/issues/1259))

Build 🏭
- bump algoliasearch-helper from 3.6.1 to 3.6.2 in /website ([1270](https://github.com/Microsoft/SynapseML/issues/1270))
- remove some deps that cause sec issues ([1264](https://github.com/Microsoft/SynapseML/issues/1264))


Documentation 📘
- Fixed broken link to CyberML notebook ([1322](https://github.com/Microsoft/SynapseML/issues/1322))
- Added website announcement bar ([1263](https://github.com/Microsoft/SynapseML/issues/1263))
- Updated and improve readme ([1262](https://github.com/Microsoft/SynapseML/issues/1262))
- Removed references to runme in contributing.md
- Supported Math expressions in website markdown ([1278](https://github.com/Microsoft/SynapseML/issues/1278))
- Corrected Synapse typo in website ([1335](https://github.com/Microsoft/SynapseML/issues/1335))


Maintenance 🔧
- Stopped lightGBM tests from timing out ([1315](https://github.com/Microsoft/SynapseML/issues/1315))
- Fixed r test flakiness ([1314](https://github.com/Microsoft/SynapseML/issues/1314))
- Updated VerifyLightGBMClassifier.scala ([1313](https://github.com/Microsoft/SynapseML/issues/1313))
- Update speech SDK test results
- Add in missing tests in build ([1300](https://github.com/Microsoft/SynapseML/issues/1300))
- Fix flaky build steps ([1298](https://github.com/Microsoft/SynapseML/issues/1298))
- Fix website telemetry ([1261](https://github.com/Microsoft/SynapseML/issues/1261))
- Add website telemetry ([1260](https://github.com/Microsoft/SynapseML/issues/1260))
- Added missing test classes to pipeline



Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

| <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/serena.jpg"> |<img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/ilya%20(2).jpg"> | <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/sudhindra.jpg"> |
|:--:|:--:|:--:|
| **Serena Ruan** | **Ilya Matiach** | **Sudhindra Kovalam** |
| Serena is an engineer on the Azure Synapse team in Beijing. In this release, Serena has continued her unbelievable speed of contributions with support for Multivariate Anomaly Detection, MLFlow, and installation from Maven Central. These contributions are just a few of the many projects Serena has contributed since she joined just a few months ago! | Ilya is a prolific engineer on the Azure Machine Learning Boston team working on responsible AI. Ilya contributed LightGBM on Spark and worked tirelessly to improve and support this feature. Ilya has been an active contributor to the SynapseML project for 5 years and has built many of the tools in the library. | Sudhindra is an engineer on the Microsoft Maps team and has contributed intelligent geospatial APIs to SynapseML v0.9.5. Sudhindra developed new ways to automate generation of Spark code from swagger files allowing him to contribute a large suite of features rapidly. |
| <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/elena_2.jpg"> |<img width="200" src="https://mmlspark.blob.core.windows.net/graphics/people/ta_interns%20(2).png"> | <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/stuart%20(2).jpg"> |
| **Elena Zherdeva** | **The Text Analytics Explorer Interns** | **Stuart Leeks** |
| Elena is an engineer on the CSX Data team working on building scalable responsible AI tools. In Elena's first contribution to SynapseML she added Individual Conditional Expectation plots at scale. She also contributed [a detailed sample notebook](https://microsoft.github.io/SynapseML/docs/next/features/responsible_ai/Interpretability%20-%20PDP%20and%20ICE%20explainer/) that does a fantastic job of explaining key concepts in Responsible AI. | Samantha Konigsberg (top left), Preeti Pidatala (top right), and Victoria Johnston (bottom) were summer explorer interns on the text analytics team. They collaborated together to build new simplified API's for the text analytics service using the Java SDK layer. One of these contributions was the new Healthcare Analytics API in Spark. This was intern's first Scala project, making this contribution all the more impressive!| Stuart is Engineer on the Commercial Software Engineering. Stuart not only uses SynapseML to power customer engagements, but also directly contributes features needed to make his customers succeed. Stuart contributed support for the new Analyze Text API which allows users to perform multiple intelligent text tasks with a single API call. Stuart also added features to SynapseML’s Mini-batchers to improve their generality. |


Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML

Jason Wang memoryz , Serena Ruan serena-ruan, Ilya Matiach imatiach-msft , Stuart Leeks stuartleeks, Sudhindra Kovalam SudhindraKovalam, Elena Zherdeva ezherdeva, Preeti Pidatala preetipidatala, Samantha Konigsberg skonigs, Victoria Johnston victoriajmicrosoft, Markus Cozowicz eisber, Yazeed Alaudah yalaudah, Suhas Mehta suhas92, Kashyap Patel ms-kashyap, Wenqing Xu xuwq1993, Markus Weimer, Jeff Zheng, James Verbus jverbus, Misha Desai, Nellie Gustafsson, Ruixin Xu, Eric Dettinger, Martha Laguna, Louise Han jr-MS, Rashid Monin, Ali Emami, Clemens Schotte, Edward Un, Johannes Kebeck, Han Li, Assaf Israel assafi, Tom Finley, Tomas Talius, Mitrabhanu Mohanty, Anand Raman, William T. Freeman, Ryan Hurey, Jarno Ensio, Brian Mouncer, Sharath Chandra, Beverly Kodhek, Nisheet Jain, Akshaya Annavajhala (AK), Euan Garden, Lev Novik, Guolin Ke, Tara Grumm, Ismaël Mejía, Keunhyun Oh, martin0258, sinnfashen, Dung Nguyen nhymxu, elswork, ONNX Team, Azure Global, Vowpal Wabbit Team, Light GBM Team, MSFT Garage Team, MSR Outreach Team, Speech SDK Team

Learn More

| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/synapseml_website.jpg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/philosphy.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/General%20Availablity%20Card-01.jpg"> |
|:--:|:--:|:--:|
| Visit [our new website](https://aka.ms/spark) for the latest docs, demos, and examples | Read more about SynapseML's GA release in the [Microsoft Research Blog](https://www.microsoft.com/en-us/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library) | SynapseML is now generally available on Azure Synapse! [Get started here.](https://docs.microsoft.com/en-us/azure/synapse-analytics/machine-learning/overview-cognitive-services) |
| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/Multivariate%20Detector%20Card-01.jpg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/large_scale_paper.jpg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/Explainable%20Boosting%20Card-01.jpg"> |
| Learn more about [Multivariate Anomaly Detection in SynapseML](https://techcommunity.microsoft.com/t5/azure-ai-blog/announcing-multivariate-anomaly-detector-in-synapseml/ba-p/3122486) | Read our [Paper from IEEE Big Data '21](https://arxiv.org/pdf/2009.08044.pdf) | [Sign up for the Private Preview](https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR3nswihwe8JLvwovyYerymVUNU5CMjdOWDZJN1VUVVFXRDdBOE45MlU4Ui4u) of Explainable Boosting Machines in SynapseML |

0.9.4

Not secure
<img width="100%" src="https://mmlspark.blob.core.windows.net/graphics/emails/email_header_synapseml.jpg" alt="SynapseML" href="https://github.com/Azure/mmlspark">

Building production ready distributed machine learning pipelines can be a challenge for even the most seasoned researcher or engineer. We are excited to announce the release of SynapseML (Previously MMLSpark), an open-source library that aims to simplify the creation of massively scalable machine learning pipelines. SynapseML unifies several existing ML Frameworks and new MSFT algorithms in a single, scalable API that’s usable across Python, R, Scala, and Java.

Highlights

| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/synapse_recolor.svg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/onnxai-ar21_crop.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/scales.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/forms_and_translate.svg"> | <img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/vw-blue-dark-orange.svg"> |
|:--:|:--:|:--:|:--:|:--:|
| **General Availability on Synapse** |**ONNX on Spark** | **Responsible AI** | **Form Recognition and Translation** | **Reinforcement Learning** |
| We are ready to help you productionalize on Azure Synapse Analytics | Distributed and hardware accelerated model inference on Spark | Understand opaque-box models, measure dataset biases, Explainable Boosting Machines | Parse PDFs and translate dataframes between over 100 languages | Contextual Bandit Reinforcement Learning with Vowpal Wabbit |

New Features

General ✨
- Renamed and rebranded! Microsoft ML for Apache Spark is now SynapseML
- New modular library sub-packages for standalone install of each major set of features
- Support Spark 3.1.2 and Scala 2.12
- Support `pip install synapseml` for python bindings


ONNX on Spark 🕸
- ONNX model inference on Spark ([1152](https://github.com/Azure/mmlspark/issues/1152))
- Add [documention](https://microsoft.github.io/SynapseML/docs/features/onnx/about/) and [notebooks](https://microsoft.github.io/SynapseML/docs/features/onnx/ONNX%20-%20Inference%20on%20Spark/) for ONNXModel evaluation ([#1164](https://github.com/Azure/mmlspark/issues/1164))

Cognitive Services for Big Data🧠
- Added Multilingual Translation APIs ([1108](https://github.com/Azure/mmlspark/issues/1108)) ([Tutorial](https://docs.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-translator-use-mmlspark))
- Added FormRecognition APIs (Invoice, IDs, BusinessCards, Layouts, Custom Models) ([1099](https://github.com/Azure/mmlspark/issues/1099)) ([Tutorial](https://docs.microsoft.com/en-us/azure/synapse-analytics/machine-learning/tutorial-form-recognizer-use-mmlspark))
- Added the FormOntologyLearner to extract meaningful "ontologies" of objects from collections of forms
- Add notebook to [Create a Multilingual Search Engine from Forms](https://github.com/microsoft/SynapseML/blob/master/notebooks/features/cognitive_services/CognitiveServices%20-%20Create%20a%20Multilingual%20Search%20Engine%20from%20Forms.ipynb)
- Updated Text Analytics API to V3.1 ([1193](https://github.com/Azure/mmlspark/issues/1193))
- Add redactedText to PIIV3 ([1247](https://github.com/Microsoft/SynapseML/issues/1247))
- Added Personally Identifying Information (PII) identification
- Added Read API
- Added Conversation Transcription API
- Cognitive service now support data exfiltration protected (DEP) VNET allowing for individualized security solutions on Synapse Analytics ([Learn More](https://docs.microsoft.com/en-us/azure/synapse-analytics/machine-learning/overview-cognitive-services#available-cognitive-services-apis))
- Added support for the m4a codec in Speech to Text models
- Added predictive maintenance notebook
- Added Cognitive Service overview notebook
- Added support for linked service authentication in Synapse Analytics
- Simple no-code support in in Synapse Analytics


Responsible AI at Scale 😇
- Added Additive Shapley Explanations (SHAP) for understanding the predictions of opaque-box models ([1077](https://github.com/Azure/mmlspark/issues/1077))
- New API for Locally Interpretable Model-Agnostic Explanations (LIME), now supports background distributions text models, and has the same API as SHAP ([1077](https://github.com/Azure/mmlspark/issues/1077))
- Added Measure transformers for Data Balance Analysis ([1218](https://github.com/microsoft/SynapseML/pull/1218))
- Add more notebook samples for documentation ([1043](https://github.com/Azure/mmlspark/issues/1043))
- Documentation and notebooks for Interpretability on Spark
- Introduce Responsible AI section on website (Interpretability + DataBalanceAnalysis) ([1241](https://github.com/Microsoft/SynapseML/issues/1241))
- Adding document and [notebook](https://github.com/microsoft/SynapseML/blob/master/notebooks/features/responsible_ai/DataBalanceAnalysis%20-%20Adult%20Census%20Income.ipynb) for Data Balance Analysis ([#1226](https://github.com/Microsoft/SynapseML/issues/1226))
- Explainable Boosting Machines for performant and interpretable ML ([Private preview on Synapse Analytics only](https://forms.office.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR3nswihwe8JLvwovyYerymVUNU5CMjdOWDZJN1VUVVFXRDdBOE45MlU4Ui4u))

Vowpal Wabbit 🐇
- Added ContextualBandit reinforcement learning ([896](https://github.com/Azure/mmlspark/issues/896))
- Added Vowpal Wabbit Overview Notebook

LightGBM 🌳
- Added matrix type parameter and improve logic to automatically infer dataset sparsity ([1052](https://github.com/Azure/mmlspark/issues/1052))
- Added several parameters related to dart boosting type ([1045](https://github.com/Azure/mmlspark/issues/1045))
- Added chunk size parameter for copying java data to native ([1041](https://github.com/Azure/mmlspark/issues/1041))
- Added number of threads parameter ([1055](https://github.com/Azure/mmlspark/issues/1055))
- Added custom objective function to LightGBM learners ([1054](https://github.com/Azure/mmlspark/issues/1054))
- Added singleton dataset mode for faster performance and reduced memory usage ([1066](https://github.com/Azure/mmlspark/issues/1066))
- Add num iteration and start iteration parameters to LightGBM model ([1024](https://github.com/Azure/mmlspark/issues/1024))
- Added the average precision metric ([1034](https://github.com/Azure/mmlspark/issues/1034))
- Added overview notebook for LightGBM
- Moved to new streaming API for dense data to reduce memory usage
- Tuned chinking code for faster performance

Build and Infrastructure Improvements 🏭

- New Docusaurus website generation system
- E2E Tests on Synapse Analytics ([1014](https://github.com/Azure/mmlspark/issues/1014))
- Split library into separately installable subprojects ([1073](https://github.com/Azure/mmlspark/issues/1073))
- Added a unified logging and telemetry system ([1019](https://github.com/Azure/mmlspark/issues/1019))
- Modernized R wrapper generation
- New Automated Python test generation ([998](https://github.com/Azure/mmlspark/issues/998))
- New extensible code generation system
- New two-tiered security for build secrets
- Update ubuntu version to 18.04
- Automated back-up ACR images

Additional Updates

Bug Fixes 🐞
- Enable backwards compatibility for `mmlspark` python namespace imports ([1244](https://github.com/Microsoft/SynapseML/issues/1244))
- Fix publishing to maven and pypi ([1242](https://github.com/Microsoft/SynapseML/issues/1242))
- Fix broken link to notebook in Data Balance Analysis doc ([1240](https://github.com/Microsoft/SynapseML/issues/1240))
- `min_data_in_leaf` missing from dataset parameters in lightgbm ([1239](https://github.com/Microsoft/SynapseML/issues/1239))
- Fix performance issue in interpretability notebooks ([1238](https://github.com/Microsoft/SynapseML/issues/1238))
- Fixed cognitive service errors ([1176](https://github.com/Azure/mmlspark/issues/1176))
- Fixed flaky tests
- Rename NERPii to PII
- Fixed cog service test flakes
- Fixed setLinkedService issues in Synapse ([1177](https://github.com/Azure/mmlspark/issues/1177))
- Improved LGBM error message for invalid slot names ([1160](https://github.com/Azure/mmlspark/issues/1160))
- Fixed generated python code ([1121](https://github.com/Azure/mmlspark/issues/1121))
- Updated notebookUtils class path ([1118](https://github.com/Azure/mmlspark/issues/1118))
- Fixed LIME NaN weight output ([1117](https://github.com/Azure/mmlspark/issues/1117), [#1112](https://github.com/Azure/mmlspark/issues/1112))
- Fixed Guava version issue in Azure Synapse and Databricks ([1103](https://github.com/Azure/mmlspark/issues/1103))
- Fixed flakiness in spark session stopping
- Fixed result parsing for forms
- Fixed explainers returning wrong results when `targetClassesCol` is specified
- Fixed CNTKModel issue due to catalyst bug on databricks ([1076](https://github.com/Azure/mmlspark/issues/1076))
- Fixed null handling in bing image response ([1067](https://github.com/Azure/mmlspark/issues/1067))
- Avoided strange issue with databricks json parser
- Fixed dependency exclusions and build secret querying
- Fixed issue in tabular lime sampler ([1058](https://github.com/Azure/mmlspark/issues/1058))
- Updated Bing search URLs ([1048](https://github.com/Azure/mmlspark/issues/1048))
- Refactored python wrappers to use common class ([758](https://github.com/Azure/mmlspark/issues/758))
- Updated java params patch ([1027](https://github.com/Azure/mmlspark/issues/1027))
- Added missing returns in new python lightGBM model methods
- Stop R binding generation from failing silently
- Fixed conversation transcription participant column functionality
- Reduce verbosity to prevent RPC disassociated errors
- Fixed performance slip in Featurize
- Added timeout logic for speech to text
- Added ffmpeg time limit enforcing for flaky streams ([1001](https://github.com/Azure/mmlspark/issues/1001))
- Fixed upload python whl file to blob([1000](https://github.com/Azure/mmlspark/issues/1000))
- Cleaned up python tests ([994](https://github.com/Azure/mmlspark/issues/994))
- Fixed read schemas ([988](https://github.com/Azure/mmlspark/issues/988))
- Made HTTP default concurrent timeout infinite
- Made HTTP rate limiting retry indefinitely
- Recommender Patch for Spark 3 Update ([982](https://github.com/Azure/mmlspark/issues/982))
- Fix typo in text sentiment schema
- Changed ints to longs for offset and duration in STT
- Fixed processing sparse vector size
- Fixed Double User agent setting bug
- Fixed build warnings ([1080](https://github.com/Azure/mmlspark/issues/1080))
- Fixed build for new intellij
- Fixed livy dependency resolution
- Fixed pom for sbt dependencies ([1202](https://github.com/Azure/mmlspark/issues/1202))
- Fixed bug in `testGen` parallelism
- Auto-update packages in docker
- remove unused code
- Fix codecov logging of wrapper generation ([1098](https://github.com/Azure/mmlspark/issues/1098))
- Fix badge publishing
- Remove issue in scalastyle file for new IJ


Documentation 📘

- Add explicit pointer to HDI install
- fix typo ([990](https://github.com/Azure/mmlspark/issues/990))
- Bump python install to top to make it clearer
- Add example CyberML notebook ([958](https://github.com/Azure/mmlspark/issues/958))
- Add CyberML link to README.md ([989](https://github.com/Azure/mmlspark/issues/989))


New Contributor Spotlight

We are excited to welcome several new developers to the SynapseML project.

| <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/serena.jpg"> |<img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/jason.jpg"> | <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/wenqing.jpg"> |
|:--:|:--:|:--:|
| **Serena Ruan** | **Jason Wang** | **Wenqing Xu** |
| Serena is an Engineer on the Azure Synapse team in Beijing. Within her first months working on SynapseML, Serena contributed Forms and Translator cognitive services, a unified logging and telemetry system, notebooks and documentation for every transformer and estimator, and a new docusaurus-based website. | Jason is a Principal Engineer on Microsoft's DSP team and is focused on large-scale responsible AI. Jason started his contribution streak with a new API for model explainability that unifies both SHAP and LIME. Jason has also contributed ONNX on Spark which dramatically broadens the scope of models that can be used in SynapseML. | Wenqing is a software engineer on the Azure Synapse team in Beijing. Wenqing has been instrumental in preparing SynapseML for General Availability. In particular, Wenqing added support for linked service authentication of cognitive services, extended E2E testing to Synapse Analytics, and added the PII identification service. |
| <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/kashyap.jpg"> |<img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/rohit.jpg"> | <img width="200" style="border-radius:50%" src="https://mmlspark.blob.core.windows.net/graphics/people/jack.jpg"> |
| **Kashyap Patel** | **Rohit Agrawal** | **Jack Gerrits** |
| Kashyap is an Engineer on Microsoft's DSP team working on improving the fairness of machine learning models. Kashyap contributed tools for assessing dataset bias without requiring a labelled dataset or model. | Rohit is a Senior Engineer on Microsoft's Cognitive Service team working on large-scale orchestration of intelligent services. Rohit modernized our Text Analytics Stack by updating to v3.0 and laid the groundwork for E2E testing on Synapse Analytics.| Jack is a Senior Engineer on the decision service and reinforcement learning team at Microsoft Research NYC. Jack contributed support for contextual bandit reinforcement learning with Vowpal Wabbit. |


Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML

Jason Wang, Serena Ruan, Ilya Matiach, Jack Gerrits, Kashyap Patel, Wenqing Xu, Markus Weimer, Jeff Zheng, Nellie Gustafsson, Ruixin Xu, Martha Laguna, Markus Cozowicz, Rohit Agrawal, Daniel Ciborowski, Jako Tinkus, Tom Finley, Tomas Talius, Mitrabhanu Mohanty, Roy Levin, Anand Raman, William T. Freeman, Ryan Hurey, Sharath Chandra, Beverly Kodhek, Assaf Israel, Nisheet Jain, Ryan Hurey, Miguel Fierro, Dotan Patrich, Akshaya Annavajhala (AK), Euan Garden, Lev Novik, Guolin Ke, Tara Grumm, Keunhyun Oh, Vanunts Arsenii, Alexandr Severinov, David Lacalle Castillo, Ryosuke Horiuchi, Ashish Solanki, Matthieu Maitre, ONNX Team, Azure Global, Vowpal Wabbit Team, Light GBM Team, MSFT Garage Team, MSR Outreach Team, Speech SDK Team

Learn More

| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/website.jpg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/philosphy.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/synapse_recolor.svg"> |
|:--:|:--:|:--:|
| Visit [our new website](https://aka.ms/spark) for the latest docs, demos, and examples | Read more about SynapseML in the [Microsoft Research Blog](https://www.microsoft.com/en-us/research/blog/synapseml-a-simple-multilingual-and-massively-parallel-machine-learning-library) | [Get started](https://docs.microsoft.com/en-us/azure/synapse-analytics/machine-learning/overview-cognitive-services) with SynapseML on Azure Synapse Analytics |
| <img width="500" src="https://techcommunity.microsoft.com/t5/image/serverpage/image-id/322907iC428A7A3896E5862/image-size/large?v=v2&px=999"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/paper_2.jpg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/webinar.jpg"> |
| Read the Synapse Analytics [Ignite Announcements](https://techcommunity.microsoft.com/t5/azure-synapse-analytics/ignite-2021-announcements-accelerate-time-to-insight-with-azure/ba-p/2912147) | Read our [Paper from IEEE Big Data '21](https://arxiv.org/pdf/2009.08044.pdf) | Watch our [ODSC Webinar](https://app.aiplus.training/courses/working-with-ai-services-at-scale) on working with AI services at scale |

0.9.2

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n



Changes:

* 81f5f80bc68918840c51023a0ba8a3cbae55a814 chore: release synapseml 0.9.2 (1237)
* 127c70a9f806c6f412e56c2d766b4b65d53d342e docs: add explanation dashboard integration example notebook (1236)
* 9b9c2fbb2341949f9a3c85837a7f6b1acb7b9b13 fix: fix publish to central maven (1233)
* 7059573dd873494851d8e1db9c5ea9ad44a945a1 fix: fix website (1234)
* d47f014159d99c999c14153c1fc7b51622c21999 fix: fix typo in sbt install
* 336eff5606a965358ef1bbff7f7f970697479e4e fix: lightgbm default params should not be specified if optional (1232)
* 3d92dd730e52d8194470347eb7fb43aca3f09343 feat: support direct pip install (1223)
* 2771853c4d956c3c5f349bc3156f4d2f7f12b0f8 docs: fix links to developer readme and R setup (1229)
* ea91189db473b7a82e66eaf3e42122b9223bcfb0 fix: fix website broken links (1230)
* bbd874407161367c6927636c1bdb6dd791bbb36e perf: website enhancement (1221)
<details><summary><b>See More</b></summary>

* c5e174214f4ada3cb9bb534140f6c4d759bd4150 feat: Measure transformers for Data Balance Analysis (1218)
* 73c6a657a1cebc580fa6fa8da56dc34eb85dc36e fix: improve azure search writer error message in Array[Array[]] case
* d8344c5b4efa6b33fbdbbba06f715d4b7f8af2a1 feat: Add the FormOntologyLearner
* 2d81b5056dce57f9191ac2beb279c554f960259c fix: update baseUrl and fix static images (1217)
* e23041f47f3bad97435eb5564e0ca451fc70aee2 fix: Fixing flaky unit tests (1215)
* 5d31e3e1054a7bcd571225f3f24e7c4990e95c78 fix: Docker image should install openjdk-8-jre as opposed to default-… (1211)
* 9623b3ea1530f32f15610459e657dcf98c0f4d49 Feat: Build our new website (1190)
* 3f74133b8a5d00220eaad6e3e8e0361e7faf8856 fix: Fixing flaky test

This list of changes was [auto generated](https://msdata.visualstudio.com/A365/_build/results?buildId=50869062&view=logs).</details>

0.9.1

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n



Changes:

* 6b814261af82ea1cdcc34c13d78d086107b72385 chore: Bump version to 0.9.1
* 274b110913dcffc2f89742c14aebfc45989533fc fix:fix doc publishing
* 600bc6e84026291a80785923e53f681b67fb1eb3 fix: fix readme badge

This list of changes was [auto generated](https://msdata.visualstudio.com/A365/_build/results?buildId=49515221&view=logs).

0.9.0

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n



Changes:

* a6c7fea6dc6a9bbffcdaeef3e587e5efdb1ada50 chore: release synapseml 0.9.0 (1206)
* 383cb951811908fe29b85253edfd8dffb9b2241c Chore: rename mmlspark to synapseml (1204)
* ecc6868e2280b5f0e2344b7e3cc9c11e19670b1f fix: don't crash on fallback storage location (1183)
* 661e3e5a443d37f24dca68a6f52d4aaae03368a1 feat: updata versions in README.md (1205)

This list of changes was [auto generated](https://msdata.visualstudio.com/A365/_build/results?buildId=49453632&view=logs).

mmlspark-v1.0.0-rc4

<a name="v1.0.0-rc4"></a>

Page 5 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.