Synapseml

Latest version: v1.0.9

Safety actively analyzes 701625 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 2 of 5

1.0.2

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n



Changes:

* 522661ae3834f3a54c0ad746350b225663c51d41 chore: bump to v1.0.2 (2140)
* 2a01c8e68ae58f281eab2afc5a3f69aa28dd2fc7 doc: update find_secret on Fabric and doc (2132)
* 23222c08403bcc067c402b95f36e9da89e62b94a fix: Add the error handling for Langchain transformer (2137)
* f3ae1465f5564afe69cf6697ac4e98937a9e0ed4 fix: use java class loader (2135)
* fc3a9992675ff42e5d2a45566abce692ed3fd9b9 docs: update CONTRIBUTING.md (2138)
* 9b20829010ff2818b623e4fb06aa7481f82ab2f9 docs: fix install instructions (2136)
* c10f46ea3d7ede110d219806428932b486a8bbcc docs: fix readme install
* 28cd6db0f02c85a20dabf461e3f7a333de900b8a chore: change udf vec2array to pyspark.ml.functions.vector_to_array (2131)
* 46a1ef816aa12292ad101ef16296bdd5aded557a docs: add audiobook paper to README
* 5e9bae1c442d5f9ea78274b219e18d690f5fe12f build: bump amannn/action-semantic-pull-request from 5.3.0 to 5.4.0 (2125)
<details><summary><b>See More</b></summary>

* 241062fac15ea96815d597f38acbc03984ef185c docs: add analyze text document (2127)
* 4623219956d4629b74bf76f7b382252d30dbd187 review docs (2128)
* 9195deef8b3c260983934010bc7f60efa93e6817 fix: Support to Bool input for Onnx models (2130)
* 4c4fc8aa5d9e080ee13b1026a000edb15a4d6485 chore: fix failing notebooks (2134)
* 90ded807fc28f8b6b6cdf25250b908543432c61e docs: use the new AnalyzeText API in docs(2126)
* 5cd78c9f610bc14a429f49e8bfe32ca72e5cfe37 Improve LightGBM Network logs (2124)
* a187cd063e4c7e6ac7f913187d41c373d66ab5f2 docs: removing spark 3.2 instructions

This list of changes was [auto generated](https://msdata.visualstudio.com/A365/_build/results?buildId=111937444&view=logs).</details>

1.0.1

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of SynapseML.\n



Changes:

* cb4fd82835e6193ac4c4283f21faa1ed4e69660c chore: bump to v1.0.1 (2123)
* 91e8c8525df06110345ea774fc4417812af4ec49 chore: add back in exclusions (2122)
* d240cbb1f4a6916fb46a622d3a33089e9001dae1 docs: pointing cognitive apis to azure ai (2119)
* 77be64100870889b563c759a079d86c6bca23ce1 docs: bump readme to spark 3.4
* ef435a2917bc383a251e574c6d88cf909b1336e3 chore: bump to v1.0.0 (2120)
* c2fdb05f44d6c705c954dc80e2c7c0f33b96a71b chore: Adding Spark34 support (2052) (2116)
* 903dc6b94e5ae617b995d94490dfafc8ff2ca4aa docs: move cognitive namespace to services namespace (2118)
* fd00b8700441ef47205950b72c7bcbe84b0f5b36 chore: refactor cognitive package to services (2117)
* b0caf2e5ff920094f2d7f80bc2dd8145009c4863 build: bump babel/traverse from 7.18.9 to 7.23.2 in /website (2098)
* c12afc51b0b68c8a3aa7188955b3795dcfc0a1c8 chore: bump speech sdk version (2107)
<details><summary><b>See More</b></summary>

* 1af71ed4d40ca52e14774623651c6fc2c784615f docs: update anomaly detector docs (2103)
* 377df2f57d485f91bdef14139c658c14003c3576 build: bump ossf/scorecard-action from 2.3.0 to 2.3.1 (2108)
* cd43ee7c73268a0545c612599fb598398923d0d7 fix: unit test break in TranslatorSuite (2111)
* cc77eda925ceeda0de4354daa7b3624a6a26f84a chore: removing gpt-review (2113)
* 70dc523114768eea12ff0648c03fcfc3785f69de fix: gpt-review action (2112)

This list of changes was [auto generated](https://msdata.visualstudio.com/A365/_build/results?buildId=109321486&view=logs).</details>

1.0.0

<a name="SynapseML v1.0.0"></a>

<img width="100%" src="https://mmlspark.blob.core.windows.net/graphics/emails/email_header_synapseml.jpg" alt="SynapseML: Simple and distributed machine learning" href="https://github.com/Azure/mmlspark">

We are excited to announce the release and general availability of SynapseML v1.0 following seven years of continuous development. SynapseML is an open-source library that aims to streamline the development of massively scalable machine learning pipelines. It unifies several existing ML Frameworks and new Microsoft algorithms in a single, scalable API that is usable across Python, R, Scala, and Java. SynapseML is usable from any Apache Spark platform and is now generally available with enterprise support on Microsoft Fabric.

Highlights

|<img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/langchain.jpg"> | <img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/AzureCogSearch.svg"> | <img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/powerbi.svg"> |
|:--:|:--:|:--:|
|**Distributed Langchain** | **Vector Search Indices** | **Semantic Link** |
| Deploy your LLM apps on millions of documents | Quickly create semantic and multi-modal search engines | Work with PowerBI datasets natively from Microsoft Fabric |
| [View Notebook](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/OpenAI/Langchain/) | [Try an Example](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/AI%20Services/Quickstart%20-%20Document%20Question%20and%20Answering%20with%20PDFs/) | [Learn More](https://aka.ms/fabric-semantic-link) |

|<img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/azure_ai_services2.svg"> | <img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/EconML-Logo-MSFT-colorXL.png"> |
|:--:|:--:|
|**Keyless AI Services** | **Orthogonal Forests** |
| Use built-in AI services without keys in Microsoft Fabric | Discover and measure heterogeneous causal effects |
| [Learn More](https://aka.ms/fabric-ai-services) | [Try an Example](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/Causal%20Inference/Quickstart%20-%20Measure%20Heterogeneous%20Effects/) |



New Features

General ✨
- Add support for spark 3.4.1 ([2052](https://github.com/Microsoft/SynapseML/issues/2052)) ([#2116](https://github.com/Microsoft/SynapseML/issues/2116))
- Enterprise support on Microsoft Fabric

Open AI and Langchain 🦜

- Add the `LangchainTransformer` for orchestrating LLMs at scale ([1925](https://github.com/Microsoft/SynapseML/issues/1925), [#2036](https://github.com/Microsoft/SynapseML/issues/2036))
- Add ChatGPT through the `OpenAIChatCompletion` transformer ([1887](https://github.com/Microsoft/SynapseML/issues/1887))
- Add Langchain notebook ([2002](https://github.com/Microsoft/SynapseML/issues/2002), [#2013](https://github.com/Microsoft/SynapseML/issues/2013))
- Add OpenAI document Q+A notebook ([2029](https://github.com/Microsoft/SynapseML/issues/2029), [#2033](https://github.com/Microsoft/SynapseML/issues/2033))
- Add custom chatbot creation to form recognition demo ([1888](https://github.com/Microsoft/SynapseML/issues/1888))

Azure AI Services 🧠

- Add Support for Azure Cognitive Search Vector Indices ([2041](https://github.com/Microsoft/SynapseML/issues/2041))
- Add keyless Azure AI services on Microsoft Fabric ([2070](https://github.com/Microsoft/SynapseML/issues/2070), [#1859](https://github.com/Microsoft/SynapseML/issues/1859))
- Support new form recognizer APIs ([1882](https://github.com/Microsoft/SynapseML/issues/1882))
- Support streaming multivariate anomaly detection ([1893](https://github.com/Microsoft/SynapseML/issues/1893))
- Add prerequisites page for setting up OpenAI and Azure AI services ([2008](https://github.com/Microsoft/SynapseML/issues/2008))


Deep Learning 🕸
- ONNX models support variable size inputs ([1851](https://github.com/Microsoft/SynapseML/issues/1851))
- Add [distributed training overview](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/Deep%20Learning/Distributed%20Training/) ([#1879](https://github.com/Microsoft/SynapseML/issues/1879))

Causal Learning 📈
- Add OrthogonalForestDML for causal learning with heterogeneous effects ([1873](https://github.com/Microsoft/SynapseML/issues/1873))
- Add [Heterogeneous Effect Quickstart](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/Causal%20Inference/Quickstart%20-%20Measure%20Heterogeneous%20Effects/)
- Support custom reference distribution in `DistributionBalanceMeasures` to detect data drift ([1885](https://github.com/Microsoft/SynapseML/issues/1885))
- Add statistical significance reporting for causal learners using `getPValue` ([1863](https://github.com/Microsoft/SynapseML/issues/1863))

LightGBM 🌳
- LightGBM streaming mode is now default ([2088](https://github.com/Microsoft/SynapseML/issues/2088))
- LightGBM supports passing Reference datasets to speed repeated execution ([1977](https://github.com/Microsoft/SynapseML/issues/1977))
- Add [LightGBM streaming mode docs](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/LightGBM/Overview/#data-transfer-mode) ([1992](https://github.com/Microsoft/SynapseML/issues/1992))


Additional Updates


Bug Fixes 🐞
- Improve LGBM exception and logging ([2037](https://github.com/Microsoft/SynapseML/issues/2037))
- AI Services and other HTTP Clients no longer retry 4XX codes other than 429 ([2005](https://github.com/Microsoft/SynapseML/issues/2005))
- Make geospatial services robust to 404s thrown by the service ([2007](https://github.com/Microsoft/SynapseML/issues/2007))
- Fix bug [1869](https://github.com/Microsoft/SynapseML/issues/1869), where DoubleML `.setFitIntercept` should default to true ([#1876](https://github.com/Microsoft/SynapseML/issues/1876))
- Fix Multivariate Anomaly error handling ([1991](https://github.com/Microsoft/SynapseML/issues/1991))
- Fix import error when using AI services on Azure Machine Learning clusters ([1951](https://github.com/Microsoft/SynapseML/issues/1951))
- Fix default values of `aadToken` & `url` on Fabric ([1918](https://github.com/Microsoft/SynapseML/issues/1918))
- Fix ONNX model shape inference on batches with shape `[-1]` ([1906](https://github.com/Microsoft/SynapseML/issues/1906))
- Add `getPValue` to python API of DoubleML ([1909](https://github.com/Microsoft/SynapseML/issues/1909))
- Add diagnosticsInfo in Multivariate Anomaly detection response ([1892](https://github.com/Microsoft/SynapseML/issues/1892))
- Fix Double ML timeout on large datasets ([1903](https://github.com/Microsoft/SynapseML/issues/1903))
- Retry OnnxHub calls to improve test reliability ([1889](https://github.com/Microsoft/SynapseML/issues/1889))
- Remove case matching for erased generic types ([1880](https://github.com/Microsoft/SynapseML/issues/1880))
- Remove extraneous `Foo` type from Python codegen ([1867](https://github.com/Microsoft/SynapseML/issues/1867))
- Update OpenAIEmbedding Schema to account for internalServiceType
- Update Maven package to include correct GitHub path ([2073](https://github.com/Microsoft/SynapseML/issues/2073))


Documentation 📘
- Automatically create Azure docs from notebooks ([1976](https://github.com/Microsoft/SynapseML/issues/1976), [#1911](https://github.com/Microsoft/SynapseML/issues/1911),[#2023](https://github.com/Microsoft/SynapseML/issues/2023), [#2021](https://github.com/Microsoft/SynapseML/issues/2021), [#2043](https://github.com/Microsoft/SynapseML/issues/2043))
- Improve OpenAI Docs ([1938](https://github.com/Microsoft/SynapseML/issues/1938), [#1937](https://github.com/Microsoft/SynapseML/issues/1937), [#1999](https://github.com/Microsoft/SynapseML/issues/1999))
- Improved LightGBM docs ([2003](https://github.com/Microsoft/SynapseML/issues/2003))
- Improve Vowpal Wabbit Docs ([1971](https://github.com/Microsoft/SynapseML/issues/1971), [#1972](https://github.com/Microsoft/SynapseML/issues/1972), [#1970](https://github.com/Microsoft/SynapseML/issues/1970), [#1969](https://github.com/Microsoft/SynapseML/issues/1969), [#1968](https://github.com/Microsoft/SynapseML/issues/1968), [#2072](https://github.com/Microsoft/SynapseML/issues/2072))
- General notebook quality improvements ([1979](https://github.com/Microsoft/SynapseML/issues/1979), [#1932](https://github.com/Microsoft/SynapseML/issues/1932))
- Improve Causal learning docs ([1905](https://github.com/Microsoft/SynapseML/issues/1905))
- Remove old notebooks and demos ([1934](https://github.com/Microsoft/SynapseML/issues/1934))
- Fix R-setup.md docs ([1946](https://github.com/Microsoft/SynapseML/issues/1946))
- Fix broken links across website and repo ([2079](https://github.com/Microsoft/SynapseML/issues/2079), [#2076](https://github.com/Microsoft/SynapseML/issues/2076), [#2042](https://github.com/Microsoft/SynapseML/issues/2042), [#2032](https://github.com/Microsoft/SynapseML/issues/2032), [#2027](https://github.com/Microsoft/SynapseML/issues/2027), [#2026](https://github.com/Microsoft/SynapseML/issues/2026), [#2025](https://github.com/Microsoft/SynapseML/issues/2025),[#2022](https://github.com/Microsoft/SynapseML/issues/2022), [#1864](https://github.com/Microsoft/SynapseML/issues/1864), [#2035](https://github.com/Microsoft/SynapseML/issues/2035), [#2049](https://github.com/Microsoft/SynapseML/issues/2049))
- Fix Recommenders repo URL ([2086](https://github.com/Microsoft/SynapseML/issues/2086))
- Fix website developer API link ([1877](https://github.com/Microsoft/SynapseML/issues/1877))
- Migrate from Cognitive Services to Azure AI services ([2119](https://github.com/Microsoft/SynapseML/issues/2119), [#2118](https://github.com/Microsoft/SynapseML/issues/2118))
- Update anomaly detector docs due to service deprecation ([2103](https://github.com/Microsoft/SynapseML/issues/2103))
- Fix docker link ([2019](https://github.com/Microsoft/SynapseML/issues/2019))
- Fix installation instructions ([2000](https://github.com/Microsoft/SynapseML/issues/2000), [#1961](https://github.com/Microsoft/SynapseML/issues/1961), [#1921](https://github.com/Microsoft/SynapseML/issues/1921))


Maintenance 🔧

- Rename Cognitive Services to AI Services and move `cognitive.*` APIs to `services.*` ([2117](https://github.com/Microsoft/SynapseML/issues/2117))
- Upgrade the Azure AI Speech sdk version to fix proxy issues and segfaults ([2107](https://github.com/Microsoft/SynapseML/issues/2107))
- Improve telemetry and logging across library ([2047](https://github.com/Microsoft/SynapseML/issues/2047), [#2099](https://github.com/Microsoft/SynapseML/issues/2099), [#2097](https://github.com/Microsoft/SynapseML/issues/2097), [#2045](https://github.com/Microsoft/SynapseML/issues/2045), [#1917](https://github.com/Microsoft/SynapseML/issues/1917), [#2109](https://github.com/Microsoft/SynapseML/issues/2109))
- Onboard to ESRP release process ([2083](https://github.com/Microsoft/SynapseML/issues/2083))
- Publish binaries to ADO Feeds ([1995](https://github.com/Microsoft/SynapseML/issues/1995))
- Upload example notebooks to storage account on every build ([2001](https://github.com/Microsoft/SynapseML/issues/2001))
- Allow publishing of custom versions ([1998](https://github.com/Microsoft/SynapseML/issues/1998))
- Add fabric to the find_secret API ([1948](https://github.com/Microsoft/SynapseML/issues/1948))
- Ensure nightly build runs every night
- Scrub Shared Access Signatures from logs ([1939](https://github.com/Microsoft/SynapseML/issues/1939))
- Clean azure search indexes during tests ([1901](https://github.com/Microsoft/SynapseML/issues/1901))
- Maintain tests ([2122](https://github.com/Microsoft/SynapseML/issues/2122), [#1994](https://github.com/Microsoft/SynapseML/issues/1994), [#2092](https://github.com/Microsoft/SynapseML/issues/2092), [#2077](https://github.com/Microsoft/SynapseML/issues/2077), [#2071](https://github.com/Microsoft/SynapseML/issues/2071), [#1982](https://github.com/Microsoft/SynapseML/issues/1982), [#1981](https://github.com/Microsoft/SynapseML/issues/1981), [#1927](https://github.com/Microsoft/SynapseML/issues/1927), [#1959](https://github.com/Microsoft/SynapseML/issues/1959), [#1861](https://github.com/Microsoft/SynapseML/issues/1861), [#1896](https://github.com/Microsoft/SynapseML/issues/1896), [#2111](https://github.com/Microsoft/SynapseML/issues/2111))
- Maintain Build System ([2024](https://github.com/Microsoft/SynapseML/issues/2024), [#1983](https://github.com/Microsoft/SynapseML/issues/1983), [#1963](https://github.com/Microsoft/SynapseML/issues/1963)), [#1944](https://github.com/Microsoft/SynapseML/issues/1944), [#1916](https://github.com/Microsoft/SynapseML/issues/1916), [#1915](https://github.com/Microsoft/SynapseML/issues/1915), [#1866](https://github.com/Microsoft/SynapseML/issues/1866), [#1949](https://github.com/Microsoft/SynapseML/issues/1949), [#1908](https://github.com/Microsoft/SynapseML/issues/1908), [#1984](https://github.com/Microsoft/SynapseML/issues/1984), [#1954](https://github.com/Microsoft/SynapseML/issues/1954), [#1954](https://github.com/Microsoft/SynapseML/issues/1954), [#1904](https://github.com/Microsoft/SynapseML/issues/1904), [#2102](https://github.com/Microsoft/SynapseML/issues/2102))
- Added and removed GPT PR Reviews ([2113](https://github.com/Microsoft/SynapseML/issues/2113), [#1957](https://github.com/Microsoft/SynapseML/issues/1957), [#2112](https://github.com/Microsoft/SynapseML/issues/2112), [#2069](https://github.com/Microsoft/SynapseML/issues/2069))
- Add .trunk to .gitignore ([2078](https://github.com/Microsoft/SynapseML/issues/2078))
- Add .bloop to .gitignore ([1897](https://github.com/Microsoft/SynapseML/issues/1897))
- Bump SynapseML Versions ([2123](https://github.com/Microsoft/SynapseML/issues/2123)),[#2120](https://github.com/Microsoft/SynapseML/issues/2120),[#2110](https://github.com/Microsoft/SynapseML/issues/2110),[#2085](https://github.com/Microsoft/SynapseML/issues/2085),[#2084](https://github.com/Microsoft/SynapseML/issues/2084),[#2011](https://github.com/Microsoft/SynapseML/issues/2011),[#1933](https://github.com/Microsoft/SynapseML/issues/1933))
- Fix website security issues ([2098](https://github.com/Microsoft/SynapseML/issues/2098), [#1874](https://github.com/Microsoft/SynapseML/issues/1874), [#1870](https://github.com/Microsoft/SynapseML/issues/1870), [#2012](https://github.com/Microsoft/SynapseML/issues/2012))
- Keep GH Actions up to date ([2108](https://github.com/Microsoft/SynapseML/issues/2108), [#2091](https://github.com/Microsoft/SynapseML/issues/2091), [#2082](https://github.com/Microsoft/SynapseML/issues/2082), [#2067](https://github.com/Microsoft/SynapseML/issues/2067), [#2065](https://github.com/Microsoft/SynapseML/issues/2065), [#2030](https://github.com/Microsoft/SynapseML/issues/2030), [#1993](https://github.com/Microsoft/SynapseML/issues/1993), [#1962](https://github.com/Microsoft/SynapseML/issues/1962), [#1960](https://github.com/Microsoft/SynapseML/issues/1960), [#1907](https://github.com/Microsoft/SynapseML/issues/1907), [#1898](https://github.com/Microsoft/SynapseML/issues/1898), [#1878](https://github.com/Microsoft/SynapseML/issues/1878))
- Stop running CodeQL for markdown-only changes ([1865](https://github.com/Microsoft/SynapseML/issues/1865))
- Normalize line-endings across repository ([1883](https://github.com/Microsoft/SynapseML/issues/1883))


Contributor Spotlight

We are excited to highlight the contributions of the following SynapseML contributors:

| <img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/aydan.jpg"> |<img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/sheryl.jpg"> | <img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/markus.jpg"> |
|:--:|:--:|:--:|
| **Aydan Aksoylar** | **Sheryl Zhao** | **Markus Cozowicz** |
|Aydan is a Senior Applied AI Engineer and a first-time contributor to SynapseML. Aydan recently joined Azure Data but quickly led the efforts to add the new integration with Azure Cognitive Search's Vector Indices. This feature allows users to quickly create flexible semantic search engines powered by rich models like GPT4. Aydan went above and beyond on thie project and also contributed a [Document Question and Answering with PDFs quickstart]( https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/AI%20Services/Quickstart%20-%20Document%20Question%20and%20Answering%20with%20PDFs/) to showcase how to use these new features. | Sheryl is Principal Applied Scientist on the SynapseML team and a first-time contributor to SynapseML. Sheryl worked hard to devise an elegant connection between the LangChain and SynapseML to enable deploying chains on large datasets. She also designed and built [a lovely quickstart](https://microsoft.github.io/SynapseML/docs/Explore%20Algorithms/OpenAI/Langchain/) to showcase how to build a distributed axiv reader with only a few lines of code. | Markus is a Principal Applied Scientist on the SynapseML team and a SynapseML veteran developer. Markus has contributed algorithms running the gamut from reinforcement learning and LLMs to anomaly detectors. This release, Markus contributed an ambitious and full-featured integration between SparkSQL and PowerBI data models. This allows users to explore their existing PowerBI datasets and measures with the full generality of PySpark or (Scala) Spark. This dramatically expands the automation possibilities within Microsoft Fabric. Markus never ceases to out-do his prior contributions and we are excited to see what he has in store next. |
| <img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/amir.jpg"> |<img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/aadharsh.jpg"> | <img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/brendan.jpg"> |
| **Amir Jafari** | **Aadharsh Kannan** | **Brendan Walsh** |
| Amir Jafari is Senior Product Manager on the SynapseML team and has recently taken over the role of the official SynapseML PM. Amir's passion to advance the library was instrumental in driving us to v1.0. He is fiercely productive and has a knack for simplifying and improving the SynapseML user experience. Additionally, Amir isn’t afraid to roll up his sleeves and contribute notebooks and blogs. He drove several efforts to create new quickstarts and documentation for a variety of SynapseML features. | Aadharsh is a Vice President and Head of Economics and Data Science at Western Digital. Aadharsh is also a new SynapseML contributor whose first contribution significantly generalized our causal inference stack to support fast estimation of heterogeneous causal treatment effects with Orthogonal Random Forests. This was a nontrivial and mathematically intensive contribution, and we are grateful for Aadharsh's expertise and persistence in getting this through our build system. | Brendan is a Senior Engineer on the SynapseML and a talented developer. Brendan's contributions range from core improvements to the SynapseML build and documentation generation system, to spearheading customer engagements and onboarding AI services. Most recently, Brendan used SynapseML to create and donate thousands of audiobooks to the open source in partnership with Project Gutenberg. This effort was considered one of [TIME's top 200 inventions of 2023](https://time.com/collection/best-inventions-2023/6324762/project-gutenberg-open-audiobook-collection/). You can learn more about Brendan’s awesome technical philanthropy efforts at https://aka.ms/audiobook. |
| <img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/Jessica.jpg"> |<img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/people/serena_color.jpg"> | <img width="200px" src="https://mmlspark.blob.core.windows.net/graphics/emails/cruise.jpg"> |
| **Jessica Wang** | **Serena Ruan** | **Cruise Li** |
| Jessica is Software Engineer who recently joined the SynapseML team. Already, Jessica has grown into the role of the SynapseML benevolent “doc”tator. This release Jessica has worked hard to ensure that the SynapseML notebooks work across a wide variety of Spark platforms and are easy and simple to get started with. This work requires knowledge of the entire library’s surface area, and we are thankful Jessica has worked so hard to learn this breadth of content. Furthermore, Jessica was also instrumental in building our Azure Doc auto-generation system to ensure all docs are tested as part of our CI build. | Serena is a Software Engineer at Databricks, a MLFlow maintainer, and a prolific SynapseML contributor. Serena's impact can be felt throughout almost every aspect of the library, and she is personally responsible for the new Form Recognizer V3 update, new streaming anomaly detection APIs, distributed deep network training, and many more features. Additionally, Serena laid the foundations of keyless authentication on Fabric, and pioneered our integration with MLFlow. | Cruise is a Software Engineer II on the SynapseML team in Bejing. Cruise has been instrumental in building and testing the keyless Azure AI services on Microsoft Fabric. With this contribution, Fabric users can configure their workspaces to use OpenAI, Langchain, and a variety of other AI services without the hassle of managing keys or authentication. Cruise has also worked hard to ensure AAD authentication works with Azure AI services and has helped the effort to standardize logging and telemetry across SynapseML and its sister projects. |

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external, who helped create this version of SynapseML

Markus Weimer markusweimer,
Eric Dettinger sandshadow,
Scott Votaw svotaw,
Mark Niehaus niehaus59,
Aydan Aksoylar aydan-at-microsoft,
Sheryl Zhao sherylZhaoCode,
Markus Cozowicz eisber,
Brendan Walsh BrendanWalsh,
Jessica Wang JessicaXYWang,
Tom Finley TomFinley,
Sailesh Baidya saileshbaidya,
Keerthi Yanda KeerthiYandaOS,
Kyle Rush k-rush,
Aadharsh Kannan AKannanMSFT,
Serena Ruan serena-ruan,
Cruise Li mslhrotk lhrotk,
Jason Wang memoryz,
Haizhou (Dylan) Wang dylanw-oss,
Sarah Shy sarahshy,
Kashyap Patel ms-kashyap,
Puneet Pruthi ppruthi,
Ilya Matiach imatiach-msft,
Amir Jafari amhjf,
Nellie Gustafsson,
Bogdan Crivat,
Justyna Lucznik juluczni,
Richard Wydrowski richwyd,
Tania Arya taniaarya,
Adithya Mukund adithyamukund,
Roman Batoukov RomanBat,
Alexandra Savelieva alsavelv,
Jessica Wolk msplants
Luis França luisffranca
Paul Koch paulbkoch
Rich Caruana,
Avrilia Floratou,
Martha Laguna martthalch marthalc,
Jeff Zheng,
Sciong Yang,
Peixian Gong,
Ruixin Xu,
Chris Hoder,
Derek Legenzoff,
Misha Desai,
Eren Orbey,
Beverly Kodhek,
Louise Han jr-MS,
Raj Rikhy,
Brice Chung,
Marcos Campos,
Mike Estee,
Kim Manis,
Mitrabhanu Mohanty,
Anand Raman,
Sudarshan Raghunathan drdarshan,
William T. Freeman,
John Moyer,
Vidip Acharya,
Ashit Gosalia,
Miguel Fierro miguelgfierro,
Ismaël Mejía iemejia,
Kartavya Neema kartavyaneema,
Daniel Ciborowski dciborow,
Mark Tabladillo marktab
Guilherme Beltramini gcbeltramini
Akshaya Annavajhala (AK),
James Verbus jverbus,
Mopé Akande msakande,
Frank Solomon fbsolo-ms1,
ONNX Team,
Azure Global,
Vowpal Wabbit Team,
LightGBM Team,
MSFT Garage Team,
MSR Outreach Team,
Speech SDK Team,
MLflow Team,
Azure Docs Team


Learn More

| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/synapseml_website.jpg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/pg_tile.jpg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/Thumbnail%20-%20Mark%20-%20Intro%20to%20SynapseML.png"> |
|:--:|:--:|:--:|
| [Visit our website for the latest docs, demos, and examples](https://aka.ms/spark) | [Learn about our effort to create thousands of free audiobooks](https://aka.ms/audiobook) | [Learn the basics of SynapseML](https://www.youtube.com/watch?v=ycQPtC--VKU)
| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/F-103D-WgAIXx8I.jpg"> |<img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/SynapseML%20Part%204%20Thumbnail.png"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/pg_poster.jpg"> |
|[Read our full list of SynapseML Ignite Announcements](https://blog.fabric.microsoft.com/blog/microsoft-fabric-november-2023-update?ft=Data-science:category#post-5122-_Toc362352225) | [Apply OpenAI language models to your large datasets](https://www.youtube.com/watch?v=L1Cdccp1neA) | [Read our Paper on Custom Voice Audiobook Creation](https://arxiv.org/abs/2309.03926)|

1.0.0rc4

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.\n



Changes:

* 5fc65abbe43f520529970d2173f671e39004e510 chore: bump version numbers (1203)
* 993da81a0ab947a65cabea89fb9cc0a52d4498bb chore: Fix pom for sbt dependencies (1202)
* 327be83c6c711d3cba3be84cda85b997dd087c44 feat: Update Text Analytics API to V3.1 (1193)
* 661057752d7baea4592842ba5af05fbdc6f3bd9c fix: fix setLinkedService in Synapse
* e08a8e2918fbf62ec2e83ddfa709023006edb0ba chore: Add script to clean and back up ACR
* d85aae8dbe489b20299892406be32c32a73c362f fix: fix cognitive service errors (1176)
* c6925dbb87b6e7c65a8b9c9c9a4b2d0161a770aa fix: fix anomaly detector test cases
* b52c36101f9eecc9f306b16ebef1b03700ad421c fix: rename NERPii to PII
* 2ce1ba6be91e2f39b2ad97550685efd474e979b6 fix: fix scala style error
* 1000fdb38ddbfbd2f4b4b52870d22b260e1e25df feat: add NERPii
<details><summary><b>See More</b></summary>

* 4682199012edc35b1ccefad7167b7aee3c844106 fix: fix cog service test flakes
* 0c4d32d4b25cbd6c32d65c7fce0f0bca95a0ff2e doc: add predictive maintenence notebook
* 80889120ff06f242310e1778130cac0ed47f30fd fix: fix setLinkedService issues in Synapse (1177)
* 2d65668b194f4cbcf070302765227352379844a0 update notebook link
* 586e6761bb242fa7124e13845c030b24648ebf42 chore: fix bug in testgen parallelism
* 5ed9a8cfab0a20b18eed982dcfcc02beae69032c chore: testing new build
* f00272ec2dc402ce5521ae5f721195c168e82323 chore: disable failing synapse e2e tests
* fdf756292c6e3679be602ef30faa8993fad65c50 chore: fix flaky serialization fuzzing test
* f5b9c5ee67b67f9913d72eafaaa13f3175967d38 chore: disable failing doc translator test
* 3ae67abdfee5f0bedd89a086b82101e7153b3b9c feat: Add Infrastructure to Run Tests on Synapse (1014)
* de4b47b8b6643575eb8dec470dec0dadfd1d836b Security upgrade required for openjdk from 8-alpine to 17-ea-22-jdk-oracle (1165)
* 21d5ec86c6fa5c4be7d627d77c56567f233c9013 docs: Adding document and notebooks for ONNXModel (1164)
* 1f9135f40b76f894b8bcea5983ba8ca37249e123 feat: rename Read to ReadImage (1163)
* 8ec07e72d85f4fcc03b51d263856823eda7f7874 fix: improve LGBM error message for invalid slot names (1160)
* 448f893684e1f503b6c5cf0d3e3543aa80b61163 feat: ONNX model inference on Spark (1152)
* a5135b2ed9bba9f785764f115df6bbeeba7c3797 feat: update DocumentTranslator to support setLinkedService in Synapse (1151)
* d5470ffecf1778a6f9ba2df32b0f07049b582e7c chore: fix flakiness in python tests (1144)
* 204799258ca23539a275bdc9ee155a6090460f93 update Cognitive Services - Overview notebook (1126)
* 6ef2d28a9a3d57d63e40202e3d50ba15ae9ee3d0 fix: flaky lime test
* 5a6f8946ec24d9f3aa957b19c6c3d8b10160a7db fix: fix flaky conversation transcription test
* cf1281d0014bb6e88c0d9f0411e5b6d6a23b4d4e build: add two teired security for build secrets
* 8eda1df878256eb68e5921eef9f0c8b6bfef5bb6 feat: add setLinkedService (1136)
* 4167921e646619186bc5ae90f2544ddffb0068ed fix: fix SpeechToTextSDK setLinedService (1138)
* 87ec5f7442e2fca4003c952d191d0ea5f7d61eac fix: fix generated python code (1121)
* 84d8d246a2c853e00743db1ea2341c47fcef67dd feat: add translator (1108)
* d287be6185ca2e2a9a7fe9940a592eda362e727d fix: update notebookUtils class path (1118)
* 0f69cf5ac9e12db78ccee67c8fc768ef3b864cb8 feat: add singleton dataset mode for faster performance and use old sparse dataset create method to reduce memory usage (1066)
* 41bfd055175f6c8f3aee437b89ca1083f394d20c fix: LIME returns NaN weight if a feature contains a single value or when the sampler cannot obtain a different state for a feature due to data skew. It returns zero weights for all other features. (1117)
* fe70f31766818d39ae059ef2e4473735014f8168 fix: fix Guava version issue in Azure Synapse and Databricks (1103)
* 115f9214562b1f9a5ac3827f9f674c86bb66eee8 fix: fix flakiness in spark session stopping
* a825a7430ee49a1c56533b7f844e9094c1e0f898 chore: auto-update packages in docker
* 9314f82c7713a140311496faaeb229727886ad51 fix: Fix result parsing for forms
* 0c6490d2394e88ed09121e3a75dde638568464a1 chore: fix flaky notebook
* 94f04a8b78460826e55eabfcd64caacfa76ec44d fix: LIME sometimes return nan weights (1112)
* 85f089d0ae7aaaaefe6afa83c8aa96268bf6db14 feat: add form recognizer support (1099)
* 931cb42b25e0d637ef251b18a524d7027bbea127 update: reformat code
* 8c69739c8ff9d714613f46528504ef4fcc67d5a5 update: update setLocation
* 124b9c651211a3a580ff4d9fa254c627dc6ae866 update: remove parens
* c2e31923b68862f8ae6890491ac1d80a44eba44f fix: reformat code
* 20a795b9bf13ac70f658c379cbd7c4998ae25496 update: use HasSetLinkedService trait
* f075a97f6f0d1bd3446caa8d8389255ec18bd0a2 update: add more cognitive service
* 13a7126bbee60a287c5cf175060a44cf9a355dae update: add more cognitive service
* 8114ccce08a88f11ce7df9353d56e18d43dbe503 update: add more cognitive service
* e5b2a20d276c0c5472045d879d9fd4e64f77e803 update: add more cognitive service
* f6e6591237c994f02ab79f53f347a79d23c02277 update: remove test code
* d01fa1818e09d8c3c38ac6bf8c4e63348c5e7196 update: add test code
* d85fc59960d871060fc0f7866e5d4d55120e6f95 update: remove testing code
* 873ed329d8324b2814c1517e62e4c18feb52087a update: add sample code for test
* d842f6205ec4bbb8562a3f60c79de96eb8ba4a53 update: add sample code for test
* 2318af64c0f08fb2605621c28c2dc5565da6f86d update: add sample code for test
* 3034b59a570af404bdc5b2f395759e6badc3f5fd update: add sample code for test
* 74215972bb6ca3d02b8d1c94c20aa54aba7f376a update: add sample code for test
* 5b7e574ebe5a01a810ebed9137b258a457b63596 update: add sample code for test
* e633635611cbd79610c835a4aed543b005b7badf update: add reflection
* df9098d5aba54940278df5e47d8ad53a5123d478 update: remove example in test files
* 2deca5ee1a6a6b32befcffbe3473ad1a9c1bbee2 update: add class path
* 80b7a08ac4d3b8ff451cffc8bae2de796df240a5 update: add reflection
* f480aff79d2e2a2c04efe0fc83564ed239af22b4 Docs update
* 40f7fbf50d1f7fef6c86d04c00117bcd89c1c2f1 Reformat notebooks with jupyter lab
* 774af7297b5f61c03b59b350923677172537898b update notebooks
* bafc8d470fcf0ef1b309831113faabf93e7e7974 Update docs, reformat notebooks
* 171ed8958126eb274d6138605540c3024dfdd80a update: notebook
* c255e6617cca64f777a49a887977fc27bfb5cffd Deprecate old lime code and update readme
* a9b55425f129aa2d251c3cfd3acb76fd2778a64c docs: Documentation and notebooks for Interpretability on Spark
* 26b9b077431b9ad76689e189225e2ecbb779461f explainer notebooks
* 84f96e9a46e756396fafd243159aa7225644bbee chore: remove ununsed code
* 541f76f7dc1c31a07adb4f7f8c903199b303a4ff fix: explainers return wrong results when targetClassesCol is specified
* e54406a32ba9a5b56e65d1a12195c824bbbc6f4b chore: fix codecov logging of wrapper generation (1098)
* a5b265e41d387ddb32fecf74e6b25f35f6034d9b feat: split library into subprojects (1073)
* c84ab47020e358fe875a29160037c4971c0a77a7 fix: Unit test OOM error (1093)
* 725a92dce673b05798a410d24658a751ffa89b2e fix: Update codeowners (1092)
* 7dd6bb1cf082bdba6298cc0a85b0b6ba95ed1f0e feat: new LIME and KernelSHAP explainers (1077)
* 00bac62b94284ab5ac94c30ff1f174571622e836 update: update spark version to 3.1.2 (1086)
* 21d6c0444e1e2747b759f65f1c63f13cca12c7f8 feat: refactor to have separate dataset utils and partition processor (1089)
* e8a97ed9ecf3b6c11a164543482ada6576f8abd2 feat: refactoring of lightgbm code in preparation for single dataset mode (1088)
* e7d4ecafc3f524906ae4548b0879c37bc8633a2d build: Fixing build warnings (1080)
* ebee5dc3ac7c0ae69b120dc2b0d50da8c6e0be53 fix: BingImageSearch fails randomly in E2E test (1082)
* 0632f1bf61ab6dc793095f1a639cbf3b0754a0d7 fix: [Workaround] CNTKModel does not output correct result (1076)
* 36ee274e93e1f7a07fc863061ad726e5ca5b49ee feat: move partition consolidator and add LocalAggregator API (1071)
* 2a716c100fc99a66d01c849256b75ced383eb23a feat: add number of threads parameter (1055)
* 63ce4ef62a916982002b0b6f8a55e3f7d12b830e fix: small issue with null in bing image response (1067)
* 6aecdf1c0c212950344f210f11aea2dfb8760009 Add sparse vector support to KNN. (1063)
* ab15ca4237225caab9c8ea6e937bbed3d911b660 fix: fix flaky conversation transcription test
* 45379694813458c5e113d84186c09b3a5c455cdc fix: avoid strange issue with databricks json parser
* 4baaf4964fc1c91a532d690a58468c13e32526ad fix: fix dependency exclusions and build secret querying
* d6b1726d9078f9fd0560c986e3913b47101fe5f7 docs: Add explicit pointer to HDI install
* ae8004afc2924304ce554c1b67e1ad4c316c7100 feat: add custom objective function to lightgbm learners (1054)
* d8bb51f8d4c8b5a9cd2e9a046fb0355dabc356f2 fix: Fix issue in tabular lime sampler (1058)
* 663d9650d3884ece260a457d9b016088380c2cb9 feat: Add more notebook samples for documentation (1043)
* 12cea2df9e479077813b611c1b098ca39b1a3133 feat: add matrix type parameter and improve auto logic (1052)
* 03b8b7d141332b2913fdb9b9b1ee3671fdd12ab7 fix: Bing search URL update (1048)
* b704515f2180ea839e67ac37753c8796f759ef1a Update Classification - Adult Census.ipynb
* bd63cc8d5ab4de1e0ae73779bda6f094d28bc720 feat: add several parameters related to dart boosting type (1045)
* b7f29e8300b85e82798c8bfee96cb95207e5b727 feat: added chunk size parameter for copying java data to native (1041)
* 1c4691f1b77b93b9fe756e726f053ea77abe77c9 Update pr.yml
* aad223e045512f5c59249e838cfff2fd5d279e2d fix: early stopping test and average precision metric (1034)
* 04a9876fd30f0162f4b17c81059753c0290a5564 fix: refactor python wrappers to use common class (758)
* f5479ddfcf9fa9e776a5e83fefe4371db0d6abcc fix: java params patch (1027)
* d7b86d34502507dc6aef01a47c186d9b6ab1cfbd Create pr.yml
* c20aee805bafa17652e014e343fbe18d1981f98f Update ado-integration.yml
* e3cffa5751c369c44186dd44adb54f91bc0626a9 Delete ado-pr-integration.yml
* 11f8dbbe6d884f55bdbcaeadcc0b741ff8baf93d Update ado-integration.yml
* 369bb8326602c55a3695d6848d32e2abedc6d12f Update ado-pr-integration.yml
* a53003f3f249bf7c1c3de87b702be418afabe405 Update ado-pr-integration.yml
* 05cb62622b214927021437e0d97426559b639d74 Rename ado-pr-integration to ado-pr-integration.yml
* 03f6f29d572d3b634375da4865c26b2def437811 Create ado-pr-integration
* 19b305f0a1170458027ea1ed35cde50ad8e870e0 Update ado-integration.yml
* a7dbeb83a78caaae7c1520c26e17d9a7aafd077e Update ado-integration.yml
* 3b8e046cfc514ace79f5bae9554d415c40438978 Update ado-integration.yml
* acbb268f93db61a863e7921ad0550d9039127d6f Create ado-integration.yml (1039)
* 1e2f33b3fa5a3ab0a58093c9dc8df6f58034d024 feat: Add MMLSpark logging infrastructure (1019)
* 99b580f5ee7c671fb662908623dddff632bedc9d feat: Add R wrapper gen
* bf337941f4fed2b4675d307aa446e0e3b54ef251 fix: missing returns in new python lightgbm model methods
* 99047351f1ec4a3d547ec622c6027506c328da68 chore: update to lightgbm 3.2.110
* 61d2bf18991b78402a405085f914366c8792afe6 feat: add num iteration and start iteration to lightgbm model (1024)
* 2c223f664c506acba4fd1ef4f53b4541df3fcc25 fix: fix issue with r bindings silently failing
* c33451fb22b7c140749ac443d5a68c98a44c1c0a fix: fix conversation transcription participant column functionality
* bc9e81ef2cf3fe5b0a1a1a586ace925fa1270d1f perf: tune chunking code, fix memory leak
* 8942198727fd652d8cae5dbf75ca7404da4e07ee fix: reduce verbosity to prevent RPC disassociated errors
* 0c44344a6354f2aae4754ec825fbbc97275eacad perf: moving to new streaming API for dense data to reduce memory usage
* 1b46782818b53c0bb6cce9cb95a6eb98bf49d177 chore: fix badge publishing
* 1e3a4a44c68fd0d5257b8708c1c5e3885330c760 fix: Fix performance slip in Featurize
* 8d4c405daec9adbe4482ba20849de6596e217bef feat: Refactor code generation system
* cd79ecda47bacec8acfa6babf6e585240e617ad0 chore: upgrade lightgbm to 3.2.100
* ffe2507ed8c1b9c20ea7efe6d3d7407c4bc88506 fix: add timeout for stt
* 3b91af32cdc1bcd24d59db28240eb23b118cb502 build: update ubuntu version to 18.04
* 4446afa5d8c6748560c650deae877374e4f7793c fix: update subscription in build secrets
* 01a8cb4f2bcce7e953d7305f80b439646fc590d8 Update developer-readme.md
* 54379bf7cdfd7fb2f27f3a0bb5f055c95e560c36 chore:remove flaky LGBMtest
* 4e915d4312ea1ad11a8dc5fba499f6507c2f8825 feat: add automated python test generation infrastructure (998)
* 9b7518316cfcc2f5debce549bbffa3566c2cb865 fix: Add ffmpeg time limit enforcing for flaky streams (1001)
* ec7cb7856381cfa1169a3f6fb119a67062510cbc fix: fix upload python whl file to blob(1000)
* 96f66447ce69e1cd24ca6ec3b69c4b980255842a fix: adding more recommendation code owners (996)
* d496aa7d437e0c7edd3237a85951e43951eee1c5 fix: cleanup python tests (994)
* 0717ac4c603ab69f5f8fcc4c87dc2bfebc90e2bc fix: Fix read schemas (988)
* 9cff1e6495a4509bcaae832a44205592ecaaa05b chore: update build to new subscription (991)
* 7a1f28b0c163979baf48ff23863752c9280a2009 Update pipeline.yaml for Azure Pipelines
* 657e6b1d969932cd68f29033001abedec6760952 Update pipeline.yaml for Azure Pipelines
* 3661a443a38111a7971f236f009fa32fd7533f74 Update pipeline.yaml for Azure Pipelines
* 7ce0c5ff8cc1e0bf470d66354c324a128da35c93 Update pipeline.yaml for Azure Pipelines
* 19672c485798d65e82bb76846d0d912ed64990e7 Update pipeline.yaml for Azure Pipelines
* f913bdd94d8cf5230e1e2274c95ee768b21680df docs: fix typo (990)
* 59b684178ed12c82c292e24d0bd1ded4effeadd4 Update README.md
* 062a470e1eb714cf4443c939e97c974f98d99d17 doc: Add CyberML link to README.md (989)
* b1c1400802a55b2899f3fa21656e187a3b6fd808 feat: add TextLIME
* d4fa5771142e3a0a02953da4792622bf1362832a fix: fix issue with NER suite test
* 86beddec070a4ccdf45d41b4dfd57183a94d5269 fix: make concurrent timeout infinite
* 89fa081b82f93d6f1240b3229c7918b166571f89 fix: Make rate limiting retry indefinitely
* f14623e21b70f6ed44ba7828f7886436e21bf496 fix: Recommender Patch for Spark 3 Update (982)
* 13ce0c974963d3ccda028658886b4cf323898071 Update developer-readme.md
* 6218a5b4fdb19a1329c8b91d6ec9148bb12f3d87 Spark 3 (970)
* 5a5147addc42036282d1b45088fb91333d45b2d3 fix: fix typo in text sentimant schema
* 4fe354826d79feffcd852bd166d91402eb1384a1 feat: Add ReadAPI
* 4dab861e080248b7b938a4b2468d5633ef4be17b feat: add conversation transcription
* 218913a131a55b9de62cd200cafe9de940cadd38 fix: change ints to longs for offset and duration in STT
* 1daca68096e595e8774938bbf5d7abb98c000e80 feat: add m4a codec
* 8e0c9b0f024c0917ae2086245c8bb52d502c0d58 chore: fix Detect face suite (968)
* 0571ae25f9c25f7e1491809756687e56e5c2e84e doc: Add example cyberML notebook (958)
* b04d6d655e37c22d043cb8de4359ec5b8ba5745a fix: fix python tests in build
* 15eb55bdf8704c2375ea6e3fdd01b6fe2620c08e chore: remove issue in scalastle file for new IJ
* 66ffeca190390115a5cd0c3c1b1c819d57ee8ece chore: lower threshold for STT tests
* 55a3c1043813ec78a00755068d6028724b91aa41 build: fix build for new intellij
* 7b1830e53fc88f6cb9efc8fc6e6bd885cd08bcef fix: fix processing sparse vector size
* 0596de944e7681d8811b2aef4390527df9dfa37e Update developer-readme.md
* 05359cfa6bf69bc67ca02e07f77e2bd91dd871e6 Update developer-readme.md
* 0a30d1ae5583bcde95a20264af0a41b0d7175149 fix: Fix Double User agent setting bug
* 1f077baa295f6c1426d5a28ba45d958e2a058edb Update pipeline.yaml for Azure Pipelines
* 52463b1750db48adbcdbc073d00574345d996363 Update pipeline.yaml for Azure Pipelines
* 78083a7ac03b5ac57e031a02d6cfe36d653470da build: fix livy dependency resolution
* c2a3921739263914d605b5f8847ec01e0000d8d2 fix:remove preview api from NERv2
* 98a827194b7f17f926a055ae5ab94aca54ba669e docs: Bump python install to top to make it clearer

This list of changes was [auto generated](https://msdata.visualstudio.com/A365/_build/results?buildId=49186594&view=logs).</details>

mmlspark-v1.0.0-rc3
<a name="v1.0.0-rc3"></a>

1.0.0rc3

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

jackgerrits rohit21agrawal



mmlspark-v1.0.0-rc2

<img width="100%" src="https://mmlspark.blob.core.windows.net/graphics/emails/email_header_rc2.jpg" alt="Microsoft ML for Apache Spark v1.0.0-rc2" href="https://github.com/Azure/mmlspark">

Highlights
| <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/isolation forest 3.svg"> |<img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/cyberml.svg"> | <img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/speech_to_text_2.svg"> | <img width="400" src="https://mmlspark.blob.core.windows.net/graphics/emails/conditional_knn.svg"> | <img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/lightgbm_on_spark.svg"> |
|:--:|:--:|:--:|:--:|:--:|
| **Isolation Forest on Spark** | **CyberML** | **Speech To Text** | **Conditional KNN** | **LightGBM + SHAP** |
| Distributed Nonlinear Outlier Detection | Machine Learning Tools for Cyber Security | Custom Speech to Text with Streaming Support | Scalable KNN Models with Conditional Queries | Interpret LightGBM Models using Additive Shapley Explanations |

New Features

Isolation Forest on Spark ⛺️
- Added LinkedIn's Isolation Forest outlier detection algorithm
- Read [the original work](https://github.com/linkedin/isolation-forest) for more info

CyberML 🧙‍♂️
- CyberML aims to provide open source tools for distributed cybersecurity workflows. This first release includes an algorithm that learns user-resource access patterns to detect anomalous access patterns. For more information see the [docs](https://github.com/Azure/mmlspark/blob/master/docs/cyber.md)

Cognitive Services for Big Data🧠
- Added `SpechToTextSDK` transformer. This new transformer transcribes raw audio files and live audio streams into text. Transcription supports realtime audio streaming, automatic splitting into utterances, and profanity detection. Supports several languages and [Custom Speech Models](https://docs.microsoft.com/en-us/azure/cognitive-services/speech-service/how-to-custom-speech).
- added `TextSentimentV3` transformer to leverage new [Cognitive Services v3 API](https://docs.microsoft.com/en-us/azure/cognitive-services/text-analytics/whats-new)
- add save and load methods to AccessAnomalyModel ([905](https://github.com/Azure/mmlspark/issues/905))
- stream robustness, output audio stream to file, and custom speech
- Add m3u8 streaming for `SpeechToTextSDK`
- enable mp3 file streaming in stt sdk ([822](https://github.com/Azure/mmlspark/issues/822))


Conditional K-Nearest Neighbors 🏡🏡
- Added `ConditionalKNN` estimator and model for efficient search of high dimensional KNNs with conditional predicates.
- Added Conditional KNN demo [here](https://github.com/Azure/mmlspark/blob/master/notebooks/samples/ConditionalKNN%20-%20Exploring%20Art%20Across%20Cultures.ipynb)
- Find hidden artistic connections with the [Mosaic](https://aka.ms/mosaic) application.

HTTP on Spark 🌐
- Added integration with python Requests to accelerate Python Requests with HTTP on Spark!
- Optimized HTTP on Spark asynchronous performance

Vowpal Wabbit on Spark 🐇
- add barrier mode support for VW ([832](https://github.com/Azure/mmlspark/issues/832))
- add support for VW readable model, invert hash and re-using a previously trained VW Spark model ([821](https://github.com/Azure/mmlspark/issues/821))
- support generic numeric types for weights and labels ([817](https://github.com/Azure/mmlspark/issues/817))


LightGBM on Spark 🌳
- add featuresShapCol to LightGBMClassifierModel ([863](https://github.com/Azure/mmlspark/issues/863))
- Expose parameter bin_construct_sample_cnt in spark for LightGBM ([780](https://github.com/Azure/mmlspark/issues/780))
- add interface function for updating learning_rate per each iteration in LightGBMDelegate ([849](https://github.com/Azure/mmlspark/issues/849))
- add delegate to monitor training ([847](https://github.com/Azure/mmlspark/issues/847))
- Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker ([791](https://github.com/Azure/mmlspark/issues/791))
- Add option to add tolerance to improvement in metric evolution ([786](https://github.com/Azure/mmlspark/issues/786))
- added pred leaf index for LightGBMClassifier
- Adding a new param for explicitly setting slot names. ([752](https://github.com/Azure/mmlspark/issues/752))
- added the top_k param for voting parallel ([762](https://github.com/Azure/mmlspark/issues/762))
- Adding a feature for positive and negative bagging fraction params. ([754](https://github.com/Azure/mmlspark/issues/754))


Learn More

| <img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/header-image.jpg" href="https://aka.ms.mosaic"> |<img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/spark_summit.jpeg" href="https://databricks.com/session_eu19/scalable-ai-for-good"> | <img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/Mark-Hamilton_Podcast_Social_09_2019_1200x627.png" href="https://www.microsoft.com/en-us/research/podcast/mmlspark-empowering-ai-for-good-with-mark-hamilton/"> |
|:--:|:--:|:--:|
| **MosAIc Finds Hidden Connections in World Art ([Article](https://news.artnet.com/art-world/mit-mosaic-ai-curator-1900193), [Demo]("https://aka.ms.mosaic"), [Webinar](https://note.microsoft.com/MSR-Webinar-Visual-Analogies-Registration-On-Demand.html))** | **[Watch the Spark Summit Europe Keynote on MMLSpark](https://databricks.com/session_eu19/scalable-ai-for-good)** | **[Learn about AI for Good and MMLSpark on the MSR Podcast](https://www.microsoft.com/en-us/research/podcast/mmlspark-empowering-ai-for-good-with-mark-hamilton/)** |


| <img width="700" src="https://mmlspark.blob.core.windows.net/graphics/emails/cognitive-services-big-data-overview.svg" href="https://docs.microsoft.com/en-us/azure/cognitive-services/big-data/cognitive-services-for-big-data"> |<img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/tree.jpg" href="https://arxiv.org/abs/2007.07177"> | <img width="700" src="https://mmlspark.blob.core.windows.net/graphics/emails/paper_2.jpg" href="https://arxiv.org/abs/2009.08044"> |
|:--:|:--:|:--:|
| **[New Docs for the Cognitive Services for Big Data](https://docs.microsoft.com/en-us/azure/cognitive-services/big-data/cognitive-services-for-big-data)** | **[Read our New Paper on Conditional KNN Trees](https://arxiv.org/abs/2007.07177)** | **[Read our New Paper on Microservices in Databases](https://arxiv.org/abs/2009.08044)** |


Bug Fixes 🐞
- Updating regular Docker Images for helm chart. ([885](https://github.com/Azure/mmlspark/issues/885))
- improve error message for invalid slot names ([897](https://github.com/Azure/mmlspark/issues/897))
- categorical parameter regression on dense dataset caused by missing whitespace ([909](https://github.com/Azure/mmlspark/issues/909))
- fix cyberml test imports
- add "s" to failing publicwasb download
- spark.executor.cores' default value based on master when counting workers ([855](https://github.com/Azure/mmlspark/issues/855))
- fix flakiness in BiLSTM notebook
- make file type case insensitive
- Add support for URI parameters and default filetypes
- remove save_resume/preserve_performance_counters options as it breaks SGD/BFGS chaining ([828](https://github.com/Azure/mmlspark/issues/828))
- fix optional parsing for the CustomOutputParser ([835](https://github.com/Azure/mmlspark/issues/835))
- Fix flakiness in io tests
- Improve codegen readability and added getters and setters to generated models
- move tests to a separate package and refactor common code
- added multiclass init score support ([805](https://github.com/Azure/mmlspark/issues/805))
- LightGBMRanker should repartition by grouping column ([778](https://github.com/Azure/mmlspark/issues/778))
- Possible multithreading issue when two scores may come in parallel they may not safely fill pointer values ([799](https://github.com/Azure/mmlspark/issues/799))
- Guarantee one boosterPtr is allocated and freed per LightGBMBooster instance ([792](https://github.com/Azure/mmlspark/issues/792))
- Fix subtle bug in reverse index creation
- add cap on max allowed port in network init ([759](https://github.com/Azure/mmlspark/issues/759))
- added min_data_in_leaf parameter ([760](https://github.com/Azure/mmlspark/issues/760))
- Reorder ADB Status Checks to fix flakiness
- increase library install timeout ([763](https://github.com/Azure/mmlspark/issues/763))
- Fix an issue with the sparkContext not being instantiated at eval time
- Fix GH release bade display
- Codegen dataframe param fixes

Build 🏭
- bump version
- Ignore existing installation when running installPipPackageTask ([895](https://github.com/Azure/mmlspark/issues/895))
- update ffmpeg on build server
- make python test loop easier:
- updating lightgbm to 2.3.180 ([850](https://github.com/Azure/mmlspark/issues/850))
- split cog services on spark tests
- Split e2e and publishing ([836](https://github.com/Azure/mmlspark/issues/836))
- Add Caching to build pipeline
- added isolation forest test to build pipeline ([800](https://github.com/Azure/mmlspark/issues/800))
- exclude scala from fat jar

Code Style 🎶
- Removing redundant file in the root directory: sp.txt ([796](https://github.com/Azure/mmlspark/issues/796))
- ball tree style fixes

Documentation 📘
- Adding section to readme for installing with apache livy ([785](https://github.com/Azure/mmlspark/issues/785))
- Add fix for maven resolver
- Added two classification examples using Vowpal Wabbit ([733](https://github.com/Azure/mmlspark/issues/733))

Maintenance 🔧
- add Roy to CODEOWNERS
- fix flaky analyze image test
- move build to new subscription ([888](https://github.com/Azure/mmlspark/issues/888))
- Update codeowners file to fix helm owwners
- remove flaky lightGBM test and add retries to Cog service tests
- Update CODEOWNERS ([831](https://github.com/Azure/mmlspark/issues/831))
- Add time in httpv2 tests to reduce flakiness on build VMs
- fixes to improve test flakiness
- updated lightgbm to 2.3.150 ([757](https://github.com/Azure/mmlspark/issues/757))
- improve efficiency of lightgbm tests
- Add more cluster status checks
- fix flakiness in IdentifyFacesSuite
- bump heap size in build
- add default UA

Acknowledgements 🙌
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

- Ilya Matiach imatiach-msft
- Markus Cosowicz eisber
- Lucy Zhang zhang-lucy
- Roy Levin rolevin
- Keunhyun Oh ocworld
- James Verbus
- Christina Lee
- Anand Raman
- William T Freeman
- Lei Zhang
- Rohit Agrawal
- Nisheet Jain
- Chris Hoder
- Chris Templeman
- Chenhui Hu chenhuims
- Ryan Hurey
- Jun Ki Min loomlike
- Dotan Patrich,
- Addy Santo,
- Anil Francis Thomas,
- Amrit Bhattacharya,
- Moshe Israel
- Dalitso Banda
- Joan Fontanals JoanFM
- Jack Gerrits jackgerrits
- Akshaya Annavajhala
- Heiko Rahmel
- Felix Tran felixtran39
- Stephanie Fu
- Parker Levy
- Casey Hillenburg
- Vick Wowo
- Brendan Walsh
- Nick Gonsalves
- Mindren Lu
- Nurudín Álvarez
- Guolin Ke
- Chris Smith chris-smith-zocdoc
- David Lacalle Castillo WaterKnight1998
- Fokko Driesprong Fokko
- Diego Mazon
- Tommy Li tommyzli
- Azure CAT
- Vowpal Wabbit Team
- Light GBM Team
- MSFT Garage Team
- MSR Outreach Team
- Speech SDK Team


Changes:

* 81e73a27477be66788aa37c042eea27fa1c9bab6 chore: add Roy to CODEOWNERS
* b12be504b1bf21c6894c13c795639d27e92353f0 build: bump version
* b431a61b06f48ec1e7bf8ff2e9809025dc5f1bf6 fix: Updating regular Docker Images for helm chart. (885)
* 96f0b7775629d6e7b521d1ed8ca0e54655deef00 fix: improve error message for invalid slot names (897)
* 95c1f8a782191e3578587a49313e1d57abee5da3 fix: categorical parameter regression on dense dataset caused by missing whitespace (909)
* 040ad34964aaa266a6318a6974f324102a8302aa feat: add save and load methods to AccessAnomalyModel (905)
* 8f8c504dee24dae8bc9262a84c00d2e5d273352c fix: fix cyberml test imports
* 9aed00480b08bd5e6378c255990ec80a0e7f9709 chore: fix flaky analyze image test
* 826cfc22b9c4e8db37e1f520302079ef993cd321 fix: add "s" to failing publicwasb download
* 22e19e5f52698a653dfae467f133c6552bc26e50 feat: CyberML (890)
<details><summary><b>See More</b></summary>
* 54a623d445442f10e5d57d2c958b3762d4d9e331 build: Ignore existing installation when running installPipPackageTask (895)
* f1b4a946bb0d573d2e0de7705ff6694a3645f04a chore: move build to new subscription (888)
* f07e5584459e909223a470e6d2e11135b292f3ea Merge pull request 882 from ocworld/fix-rename-clusterutils-numcores
* e741993efa34357f17ba5b2d1db357e8a6a68940 build: update ffmpeg on build server
* 9f9ae53e8927f7c91283b611d0556e1c332f5757 feat: stream robustness, output audio stream to file, and custom speech
* 0319650f275c8f4539c1ab14d4ac0660352ae32e build: make python test loop easier:
* 65a13bc1c11b1799f1beb35cc83e5d5723b32526 chore: Update codeowners file to fix helm owwners
* 7409ba58f1ef25be349c19cf429c880c8d7eb4dc Add num tasks override parameter for LightGBM learners (881)
* 64481e9437db43eb5f25cb33e31f097bcc59eccf fix: spark.executor.cores' default value based on master when counting workers (855)
* 4ae0fe87699d32c65dc75fa2b1787a0d70d71d75 reduce network communication overhead cost on reduce step for LightGBM learners (869)
* b4137492445060f5bcb5cab955e4bf4f91fb9543 fixed shap values shape for multiclass case and improved pyspark API (870)
* 840781a2ae6c3e9ee0a065294c893e53df576de7 unify APIs across LightGBM learner types and add SHAP feature importances to regressor (864)
* 84b392c3a46cff8d2138326da960a912ed0baf75 re-disable flaky test (866)
* d86a9370a9f0baf966f264e686751ddcdd29215c build: updating lightgbm to 2.3.180 (850)
* 6bb4a45f5bcc9f67392f934e6ec94670145bac3f feat: add featuresShapCol to LightGBMClassifierModel (863)
* 82e7a8eb59d809a4ff5a66d06bedca1ea958bbe3 Bump Apache Spark to 2.4.5
* a0db5b330b75e0211f629846cb36558b576e339f build: split cog services on spark tests
* 537b611d9df7bbf2927666095d04f3c785dad66a 1) add functions for before/after batch training (852)
* ed435b82e8db55f902c15d18fe1fb52cba1631bd feat: Add m3u8 streaming for `SpeechToTextSDK`
* 4d998794c114b43a5c60f5b2ed1182fb3f656c7a feat: add interface function for updating learning_rate per each iteration in LightGBMDelegate (849)
* be366c514d08e820ca0b1112072db2fb76e6f65f feat: add delegate to monitor training (847)
* c695d7a93b1a5b86941c8c9c2e4b586f0a6c421e add option for driver listen port
* 99795bc38fc4decc20667ff2b6a6c34e64196209 fic: Codegen dataframe param fixes
* 37e336ef7534bbcc881ec0d999ff60812057d10f feat: add barrier mode support for VW (832)
* 9c9a93b857d46a24458cc53f74aa8dfb95135a8a fix: fix flakiness in BiLSTM notebook
* 5d9410a032ef3dfdae647c98fe771d547b910cd0 fix: make file type case insensitive
* 55765f8e13cc8fa28a0acbacc71e833871a9cd36 chore: remove flaky lightGBM test and add retries to Cog service tests
* b1e37972644ddd66b109cbeef4e1fb2c8578e20c fix: Add support for URI parameters and default filetypes
* 5ae664affe7946a79fb6dbe096edc81b062d17f7 improvement: support numeric types (not just double) for weight/label (817)
* 9f15b6cd1a6d582dec9891b61430aeafad24b3b4 feat: add support for VW readable model, invert hash and re-using a previous… (821)
* 038b26b3d266a2f99d6b9f094aa97188be108fec fix: remove save_resume/preserve_performance_counters options as it breaks SGD/BFGS chaining (828)
* 7dd467092d83e162116cea5bb3084c359207cb87 build: Split e2e and publishing (836)
* ca05d1b5b99e6fa93aaf8d9916e55f4c7579d226 extended test case to validate duplicate passes parameter (834)
* 2ff6a36c64797847c0a57fb0a75ba697f7dd3e99 fix: fix optional parsing for the CustomOutputParser (835)
* f9a56e886ad02ed6b233114424b88ede71f30d7a chore: Update CODEOWNERS (831)
* c79dd12abca579d416acdb46c049132b4b41cd0d chore: Add time in httpv2 tests to reduce flakiness on build VMs
* c7eed5a9f7e6c16a9c8d3270a012177fbe5ab6d5 build: Add Caching to build pipeline
* c5b8b1579afd30fd7b63d1234023f97b2c2668e4 fix: Fix flakiness in io tests
* 3abd9b44324ee5abb7a134d24bd96aa69c67680b chore:Split up io tests into 2 sections
* 5489271aaa42736a6700d1684fc331bd5cd2354c fix:remove error prone IO from notebook tests
* b4a60e5655d585c8ef7b91abf00a0d9dd205a59b fix:remove error prone IO from notebook tests
* 2455cbeb5c4b8de746d2f56089445d3175bc715e chore: fixes to improve test flakiness
* 6d7cfb5f17ca0b5a9ec807da21a77ec78c65b0f3 fix: Improve codegen readability and added getters and setters to generated models
* 015d4ea0c27fd9bd710be0c6467c410afc58dc3a fix: move tests to a separate package and refactor common code
* 6b2edc34a6116717267f65dead8582488d91cd9f feat: enable mp3 file streaming in stt sdk (822)
* 8005c1702dcf2c3d22fc821e45b14637a78c5c1f feat: Add `TextSentimentV3` Transformer (812)
* df0244c7b4f48e6c86bd8a5478d4b674a9a554ce fix: added multiclass init score support (805)
* e745784c7e60a07652bfc24e3039ed5906754541 fix: LightGBMRanker should repartition by grouping column (778)
* f7029211e737cbbc3019a2da48f5bc72d9f213e9 feat: Add the option to get Feature Contributions in LightGBMBooster used by LightGBMRanker (791)
* 875f89de89d3d7ed46a6bb4f73aca336fa276f09 build: added isolation forest test to build pipeline (800)
* 290f5cfca57606728a531076cca521d6d2bfda11 fix: Possible multithreading issue when two scores may come in parallel they may not safely fill pointer values (799)
* fb3ac9932d56c094a50dd3dceaf08ef6fdbe3ae1 docs: Adding section to readme for installing with apache livy (785)
* 7b8efa593037c3a53d2363c702e43391bb6e3304 fix: Guarantee one boosterPtr is allocated and freed per LightGBMBooster instance (792)
* 4c812d793a31bd4537e9f7d53cda6c90f08d7c44 style: Removing redundant file in the root directory: sp.txt (796)
* bd2f71e6a59c2b8ad730c8bafadc598faf189779 feat: Integration of LinkedIn's Isolation Forest (781)
* 9c61053fa126959c962c0707aa543451ef077574 feat: Add option to add tolerance to improvement in metric evolution (786)
* dbb281821542dff82386e6855e807d2e906c11bf feat: Expose parameter bin_construct_sample_cnt in spark for LightGBM (780)
* fde2d3cd4b6b72b789cbd74087d354c5164deb12 fix: Fix subtle bug in reverse index creation
* 4b4af04893966d745bc1cc9cf34cda80837992d4 feat: add demo for `ConditionalKNN`
* cf48d53c5fae480f9278b104fea4597cb966af6a chore: remove keys from demo
* 2618422c6f1249f364ef9da4de5df3c9b648ecd9 feat: Add `SpeechToTextSDK` Transformer
* 4da1ff2a1f4e88f5fe7b2a634510bd7dcbcc2993 style: ball tree style fixes
* 849527d58972a67addd9934717d5db09f3f39897 feat: Add python bindings for `ConditionalBallTree`
* d4d4ca82b809fd18671a2bf629b0b6201fc9a4f8 feat: Add KNN and ConditionalKNN Estimators
* 134ddb5beba80165516d58740a364cc152531f65 fix bug in serialization
* a00c141ce632a4259e285605853a2889ecca04cc fix review points
* 9cf33cef3387d3edd12dffe8b476b60a976b5203 feat: added pred leaf index for LightGBMClassifier
* 461d27d535414fe9e2547dc05187853ef1facc4e feat: added pred leaf index for LightGBMClassifier
* 3a7a8130ee4f82fbfd99edc24f2562cc63cded6a feat: added pred leaf index for LightGBMClassifier
* f3d624dbd3b15b64061ab8fb9b4a3f6a61f35f99 feat: Adding a new param for explicitly setting slot names. (752)
* 280cab7b020c4afdc9084db74ac33ddcb9abcd8f Expose dump model method on MMLSpark-LightGBM so that models can be saved as json.
* 3da5d4f4cb68b6a6708d26b11497e48393594aa8 fix: add cap on max allowed port in network init (759)
* 91652f2e2302a4ae9d309534badff8ca8a2fd517 fix: added min_data_in_leaf parameter (760)
* 6bb042909df38ea82d4f2ec608e5235400cdc3a2 chore: updated lightgbm to 2.3.150 (757)
* 344dbbda12a2c6b309df290c268b94e4a6d83d1c feat: added the top_k param for voting parallel (762)
* ae634973d683514c32aaef95250fa80517bfeaca chore: improve efficiency of lightgbm tests
* d9568dc2f8b5d1dc53c4293508a4bbb12a4d2653 chore: Add more cluster status checks
* a9b05b91b59ae2a3fe213c9752c7c7343fd86bd6 chore: fix flakiness in IdentifyFacesSuite
* 988403ff42630b18e453c01aa6cabf12f9b91fe0 fix: Reorder ADB Status Checks to fix flakiness
* e1dc2b3df3a9d1a4b3ad5674676ec6d9838d4743 fix: increase library install timeout (763)
* a47922f7a45dbcb2c19406960b1b895f82582a8d change labelGain description
* 43b4e63462a641dace34d16354c7dbef9fefd2e7 feat: Adding a feature for positive and negative bagging fraction params. (754)
* 087f290f301d7ec0ae1d9c6fb0a06cda3140fdfb docs: Add fix for maven resolver
* 3da1d148c07d9de2a8ec7f46bd0de801873ca9cd docs: Added two classification examples using Vowpal Wabbit (733)
* dece5aed536658720c7daa8066ea6576bc7cf72c chore: bump heap size in build
* 8bb7d861981fd519d4623cc903cbe9919762cb3f build: exclude scala from fat jar
* 2465d4e3a9bf34a929d340e30aaf608997e311ac fix: Fix an issue with the sparkContext not being instantiated at eval time
* d091b37c050a9c546ac4cb05186f878d93be8282 chore: add default UA
* 614a4448aed1ffe1bdc58c2fcb7e84539e9fd42c perf: remove async bottlenecks from HTTP on Spark
* 3caf8f0ef7eb7309f4c896d9e1c44c64e2e12e2d feat: Add wrappers for integrating with python Requests
* 2fdfe3e852f7010507a1af382ad483a0b977bac6 added max_bin_by_feature, min_gain_to_split, max_delta_step parameters (712)
* 95b7ef006d5cdb77346beb826130dc31239fa1db Fix scalastyle
* 56046025c0fb90816ba176d74ac54e7a411376b9 Fix default case check. Add test cases for countCardinality
* 491c01cd3de5796a6f9a6abfaf18f6bc67219b37 change getTrainingCols from Option[DataType] -> Seq[DataType]
* 25425a006d19d2185349f2c0f570e6333b8ab1fd Use a case class instead of anonymous tuple
* c58b216f477a4d0506f0a3f1ba61e55a1356c1cd Support the group column being a string
* f22aa732960abdd5c4db00a0d25b88b86b5c28fa Fix: Fix GH release bade display

This list of changes was [auto generated](https://msazure.visualstudio.com/Cognitive%20Services/_build/results?buildId=33991903&view=logs).</details>

mmlspark-v1.0.0-rc1

<a name="v1.0.0-rc1"></a>

1.0.0rc1

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.\n


Changes:

* 8d31c026a252677654717768e942e1cf1adc9082 chore: Bump Version Number to 1.0.0-rc1
* 2701aedc2a5115860cdeeab7b30e94515f45b828 fixed early stopping test for validation (711)
* 6b07829ab302a0e79c34af36fdb12082c83794fa docs: Example notebook of VW vs LightGBM (641)
* 163dead1c86c8c3b0506c65d32758a5bb9712f2f fix:fix num cores per executor if config not specified (709)
* bc0e0108316927c477b3d3211a4c1193f405d591 chore: ignore flaky test for now
* ea7d89903163b0efdff815d0e6f3646cf913d11e feat: Add brands and objects to analyze image transformer
* 04a2fbd31ea3adc857d7d29d6155e00df7532414 feat: added label conversion for VW binary classifier (0/1 -> -1/1) (700)
* da124d79f31dde9237c881e7d5d11c83433eece8 feat: Add VowpalWabbit ngram support (696)
* a44dafd42562821bc28ab0f9fff39c6991336d49 fix validation data and ranker preprocessing
* 403786950ce981ac46b99eae767fe0534d379d7f feat: Add automatic schema inference for writing to Azure Search (704)
<details><summary><b>See More</b></summary>

* 77bb67817d9361c0a8829d06948c5eebbf20d3fc update lightgbm to 2.3.100, remove generateMissingLabels, fix lightgbm getting stuck on unbalanced data
* 2e45613e6c42949368eaa139989f2e7b18cabfe8 build: Add ability to create fat jars (702)
* 035fcd91787cdc1b1b07cfb1bc7c13d5d9f5fa84 cleanup duplication in unit tests (695)
* 932ec8667644ae991fcb71b0f527392f6f797677 adding debug for client mode issue and future investigations
* 95061d0422f32c50f30b4adb13e674b4517eca50 fix: Vowpal Wabbit kwargs + improvements (692)
* 3ea5bc53cd0200ec3c9c7f9916aab48aca414961 fix: cast errors for label, weight and init score columns
* f2bf39fb02ad648de7b5fe77a37ec35919162b5a fix categorical handling on lightgbm learners
* 671b68892ace5967e60c7a064effd42dd5a21ec7 re-enabling windows tests for lightgbm
* 8361eadff3ca1e5a7410825643801f49b78e5190 add eval_at parameter to lightgbm ranker
* c0921fb0f70612fc0e1c2003e9cdb0f40148d911 Better error message when the group column is not a Int/Long
* 05a2bef54fa88a2293020215cf4cae34f2d212c5 fix: update lightgbm to 2.2.400, fix probabilities and some win errors
* 16ea090cbc038a466880514fae81dd111b2f099b chore: imporve code-quality
* ef14350ef283ba4bb92724ed11db78e6227877ef build: databricks tests use instance pools to remove state (673)
* 8b27d888824bbca6a385b4d3b7b0364b0150b903 feat: add metric parameter to lightgbm learners (672)
* 9805996143d4cf174895ff2e08bb61fd2c99c4f1 fix: fix barrier execution mode with repartition for spark standalone (651)
* 1e186adf29ba605a2220228ccc9ffb788555bec7 chore: move to new subscription (661)
* 360f2f7d8116a931bf373874cd558c43d7d98973 refactor: clean up distributed HTTP tests
* 5eedc9360411610555de2323570d223fea0af340 fix: mitigate flakiness in speechToText test
* 029038610ca56177f3566937dd15747df2b33d67 refactor: clean up continuous http tests
* 8ed3aeb140eb951208a77fc8a6093a6ac24f8a47 refactor: clean up LightGBM tests
* f99c9f402c60418f3043eb6aa50aae7b8cf476c2 docs: Update Cog Service docs (659)
* df089cdc39512d59592fe70b09acd4b8337a63ce docs: fix typo in spark serving docs (656)
* b369244e20d7155029d9c44d90fa4419dee0a6aa docs: add vw to related software
* 876553a300f245a23c5b5db3eb6cfe71e7674216 docs: add links to readme
* 81360227321e7a6befc9cbba86721dc10969404e docs: change paper badge color
* f974a6a30e5d85cea7dd72eb957d0a16d8b86cb2 docs: improve README
* 8190eb5c721e45b27840c453ee958cdebeabc47f Add links to API documentation
* 241a48640a06859d468f13178907267f3d34eb83 docs: add centOS to vw on spark docs

This list of changes was [auto generated](https://msazure.visualstudio.com/Cognitive%20Services/_build/results?buildId=25880214&view=logs).</details>

mmlspark-v0.18.1

<a name="v0.18.1"></a>

Page 2 of 5

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.