Mmlspark

Latest version: v0.0.11111111

Safety actively analyzes 693883 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 1 of 3

1.0.0rc1

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.\n


Changes:

* 8d31c026a252677654717768e942e1cf1adc9082 chore: Bump Version Number to 1.0.0-rc1
* 2701aedc2a5115860cdeeab7b30e94515f45b828 fixed early stopping test for validation (711)
* 6b07829ab302a0e79c34af36fdb12082c83794fa docs: Example notebook of VW vs LightGBM (641)
* 163dead1c86c8c3b0506c65d32758a5bb9712f2f fix:fix num cores per executor if config not specified (709)
* bc0e0108316927c477b3d3211a4c1193f405d591 chore: ignore flaky test for now
* ea7d89903163b0efdff815d0e6f3646cf913d11e feat: Add brands and objects to analyze image transformer
* 04a2fbd31ea3adc857d7d29d6155e00df7532414 feat: added label conversion for VW binary classifier (0/1 -> -1/1) (700)
* da124d79f31dde9237c881e7d5d11c83433eece8 feat: Add VowpalWabbit ngram support (696)
* a44dafd42562821bc28ab0f9fff39c6991336d49 fix validation data and ranker preprocessing
* 403786950ce981ac46b99eae767fe0534d379d7f feat: Add automatic schema inference for writing to Azure Search (704)
<details><summary><b>See More</b></summary>

* 77bb67817d9361c0a8829d06948c5eebbf20d3fc update lightgbm to 2.3.100, remove generateMissingLabels, fix lightgbm getting stuck on unbalanced data
* 2e45613e6c42949368eaa139989f2e7b18cabfe8 build: Add ability to create fat jars (702)
* 035fcd91787cdc1b1b07cfb1bc7c13d5d9f5fa84 cleanup duplication in unit tests (695)
* 932ec8667644ae991fcb71b0f527392f6f797677 adding debug for client mode issue and future investigations
* 95061d0422f32c50f30b4adb13e674b4517eca50 fix: Vowpal Wabbit kwargs + improvements (692)
* 3ea5bc53cd0200ec3c9c7f9916aab48aca414961 fix: cast errors for label, weight and init score columns
* f2bf39fb02ad648de7b5fe77a37ec35919162b5a fix categorical handling on lightgbm learners
* 671b68892ace5967e60c7a064effd42dd5a21ec7 re-enabling windows tests for lightgbm
* 8361eadff3ca1e5a7410825643801f49b78e5190 add eval_at parameter to lightgbm ranker
* c0921fb0f70612fc0e1c2003e9cdb0f40148d911 Better error message when the group column is not a Int/Long
* 05a2bef54fa88a2293020215cf4cae34f2d212c5 fix: update lightgbm to 2.2.400, fix probabilities and some win errors
* 16ea090cbc038a466880514fae81dd111b2f099b chore: imporve code-quality
* ef14350ef283ba4bb92724ed11db78e6227877ef build: databricks tests use instance pools to remove state (673)
* 8b27d888824bbca6a385b4d3b7b0364b0150b903 feat: add metric parameter to lightgbm learners (672)
* 9805996143d4cf174895ff2e08bb61fd2c99c4f1 fix: fix barrier execution mode with repartition for spark standalone (651)
* 1e186adf29ba605a2220228ccc9ffb788555bec7 chore: move to new subscription (661)
* 360f2f7d8116a931bf373874cd558c43d7d98973 refactor: clean up distributed HTTP tests
* 5eedc9360411610555de2323570d223fea0af340 fix: mitigate flakiness in speechToText test
* 029038610ca56177f3566937dd15747df2b33d67 refactor: clean up continuous http tests
* 8ed3aeb140eb951208a77fc8a6093a6ac24f8a47 refactor: clean up LightGBM tests
* f99c9f402c60418f3043eb6aa50aae7b8cf476c2 docs: Update Cog Service docs (659)
* df089cdc39512d59592fe70b09acd4b8337a63ce docs: fix typo in spark serving docs (656)
* b369244e20d7155029d9c44d90fa4419dee0a6aa docs: add vw to related software
* 876553a300f245a23c5b5db3eb6cfe71e7674216 docs: add links to readme
* 81360227321e7a6befc9cbba86721dc10969404e docs: change paper badge color
* f974a6a30e5d85cea7dd72eb957d0a16d8b86cb2 docs: improve README
* 8190eb5c721e45b27840c453ee958cdebeabc47f Add links to API documentation
* 241a48640a06859d468f13178907267f3d34eb83 docs: add centOS to vw on spark docs

This list of changes was [auto generated](https://msazure.visualstudio.com/Cognitive%20Services/_build/results?buildId=25880214&view=logs).</details>

0.18.1

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

Ilya Matiach, Markus Cozowicz

Changes:

* 62946d1adf7baa4817f54f6c166db38cea9900db chore: bump version number
* d518b8aa3aae7ace6608742271f7873decb76b84 fix: fix lightgbm stuck in multiclass scenario and added stratified repartition transformer (618)
* 85fb3fc4fa60de7dbe2c20aeb05c4712f0c48d38 fix: fix schema issue with databricks e2e tests (653)
* 258cafbd74727b9eed1b7ae66d07e7f85b7b07a6 fix: update VW dependency to 8.7.0.2 built on CentOS and optimized for portability (652)
* 376cc6a86e43a2c50d9fee2adb92c34193ebd606 build: add proper secrets to publishing step (650)
* 0be08e91cd6c3cc20bd22e98a0f65061df88dbcf docs: Remove script action section

This list of changes was [auto generated](https://msazure.visualstudio.com/Cognitive%20Services/_build/results?buildId=24368418&view=logs).</details>

0.18.0

Highlights
| <img width="800" src="https://mmlspark.blob.core.windows.net/graphics/emails/vw-blue-dark-orange.svg"> |<img width="800" src="https://mmlspark.blob.core.windows.net/graphics/emails/devops_recolor_2.svg"> | <img width="800" src="https://mmlspark.blob.core.windows.net/graphics/emails/lightgbm_on_spark.svg"> | <img width="800" src="https://mmlspark.blob.core.windows.net/graphics/emails/speech_to_text_2.svg"> |
|:--:|:--:|:--:|:--:|
| **Vowpal Wabbit on Spark** | **Quality and Build Refactor** | **LightGBM Ranking and More** | **Anomaly Detection and Speech To Text** |
| Fast, Sparse, and Scalable Text Analytics | New Azure Pipelines build with Code Coverage, CICD, and an organized package structure. | Barrier Execution mode, performance improvements, increased parameter coverage | New cognitive services on Spark |

New Features

Vowpal Wabbit on Spark: Fast and Sparse Text Analytics
- VW on Spark is a new collaboration between the [Vowpal Wabbit library](https://github.com/VowpalWabbit/vowpal_wabbit) and the Apache Spark community
- For full documentation check out the [VW on Spark Docs](https://github.com/Azure/mmlspark/blob/master/docs/vw.md)
- Added `VowpalWabbitClassifier` and `VowpalWabbitRegressor`
- Added [Vowpal Wabbit - Quantile Regression for Drug Discovery.ipynb](https://github.com/Azure/mmlspark/blob/master/notebooks/samples/Vowpal%20Wabbit%20-%20Quantile%20Regression%20for%20Drug%20Discovery.ipynb)

LightGBM on Spark
- Now supports barrier execution mode
- Added the `LightGBMRanker`
- Added `is_provide_training_metric` to LightGBMRanker.
- Enabled continued training with init score column
- Added batch training support
- Reduced memory usage
- Fixed issues with frozen jobs
- Fixes for multiclass classification
- Fixed issue where multiclass classification hangs due to partitions without all classes

HTTP on Spark
- Added `AnomalyDetector` and `SimpleAnomalyDetector` APIs
- Added `SpeechToText` transformer
- Improved service concurrency
- Added robustness to socket timeouts

Miscellaneous
- Codegen support for wrapping `Ranker` classes
- Notebooks now leverage public blob for faster execution
- Fixed summarize data column handling
- Better compute model statistics error messages
- Upgraded to Spark 2.4.3
- Added Spark on Kubernetes Helm Charts
- Added `StratifiedRepartition` transformer for ensuring partitions contain all classes
- Fixed issue where `ImageFeaturizer` could not be executed on Databricks 2.4.3

Build, Quality, and Infrastructure Refactor

Azure Pipelines Integration
- Tests parallelized on Azure Pipelines. Builds now take ~25min vs ~90min!
- Serverless Builds: Queue as many builds as needed with no machine maintenance costs
- Test results, error messages, and time are viewable from github PR section
- Individual Tests can be re-queued from the GitHub PR Page
- Builds can be queued using the pull request comment: `/azp run`.
- Full details can be seen by typing `/azp help`
- CI pipeline entirely specified in small .yaml file in git repo

<img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/build.jpg">

Local Developer Support
- Dramatically simpler developer setup (all through SBT)
- Local developer setup now works on any platform including windows!
- Local setup no longer needs VM, Vagrant, or 30 min to import the library
- All build stages are SBT tasks and can be done locally for rapid testing
- This includes publishing maven packages to local repositories and the MMLSpark maven repo
- All secrets now managed by centralized Azure Key Vault
- IntelliJ will pick up on all scalastyle rules for editor-level style feedback while typing

Code Quality Gates
- Code Coverage now supported for every PR and reported in the comments and badge
- Coverage is now a check-in gate to never decrease
- Test coverage increased and dead code removed from the library
- Custom and auto-generated Python tests now supported
- CODEOWNERS file for better code reviews and maintenance
- Codacy integration for automated PR reviews

<img width="600" src="https://mmlspark.blob.core.windows.net/graphics/emails/codecov_2.gif">

Streamlined Library Structure
- MMLSpark now supports a true Scala/Java idiomatic package hierarchy
- Namespace hierarchy also reflected in PySpark code
- **Note: This will require changes to existing MMLSpark Programs. For Support in migrating please contact `mmlspark-supportmicrosoft.com`**

Maintainability and Community Management
- Issue and PR templates
- Gitter channel
- Welcome bot to greet new contributors
- Semantic Commits for autogenerating release notes
- Badges to display current and master versions in the README

Migration Support:
- For those that already have MMLSpark developer setups please read the new developer guide to reconfigure.
- For those that have standing PRs that need rebasing assistance please reach out to `mmlspark-supportmicrosoft.com`
- Please report any bugs or feedback!

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

- Ilya Matiach, Markus Cozowicz, Scott Graham, Daniel Ciborowski, Christina Lee, Dalitso Banda, Shaochen Shi, Sudarshan Raghunathan, Anand Raman, Eli Barzilay, Nick Gonsalves, Tao Wu, Jeremy Reynolds, Miguel Fierro, Robert Alexander, AI CAT Team, Azure Search Team

Contributions, Collaborations, and Feedback Welcome!

|<img width="200" src="https://mmlspark.blob.core.windows.net/graphics/emails/spark.svg"> | <img width="500" src="https://mmlspark.blob.core.windows.net/graphics/emails/spacer.jpg"> | <img width="200" src="https://upload.wikimedia.org/wikipedia/commons/thumb/9/96/Microsoft_logo_%282012%29.svg/800px-Microsoft_logo_%282012%29.svg.png"> |
|:--:|:--:|:--:|





Changes:

* 3bb48b8400e92d660355c10c9c6770f5d37f681a chore: bump version number
* b0797b37929968063a860ff8bc16900732c624a9 docs: Improve cog services on spark docs
* 8e966b3c098e6a6170221620638479fb7ec561c3 docs: Docs for Cognitive Services (647)
* eb0a421c360835b22dfefced8a841d0d39c10db8 docs: Improve VW on Spark Docs
* 54dbcadb21a5b4bc5147f61803a975436d7126ba docs: add VowpalWabbit documentation
* fb5b79f460dd3c57a19c6b658cb60ee64db0c949 docs: fix vw on spark description
* c0d5786aee8d41dda3361a5e5111a88275592327 docs: update readme badges and icons
* 071b6b0ab0ada8f3c1720949a6f3f84a16c2da87 docs: Add gitter badge
* 5c343567003af3546e3b62183b901429889edf76 docs: Add VW on Spark to table
* 1bdcdbfb4314d1e464c566b27806dace14a7bc20 chore: ignore .github folder for CI
<details><summary><b>See more</b></summary>

* 01d498c2f7c18bb57a3ecd2327482fc9696acd46 build: add sonatype publishing
* 8fab72d2662ed933d5fe551b1394a711b6145797 build: make e2e cancellable
* ddc7a4f910d391cb7b1b2d500fe37c48f3ecbc87 build: remove broken codecov flags (will reinstate when codecov fixes their service_
* 188cbdbf5a6d74e00e2351dfe78b994708bb0270 chore: Update issue templates
* f67b16aba8133cffeda350cd7be37577e64175a8 chore: fix welcome bot indenting
* eeb7eba1e0b3eda3996ed7a47451d1aa24b2286f fix: Fix logistic regression error when passing "--link logistic" (644)
* b6a4f9320697c264bf73b19879ca15c1e59b75f3 fix: fix socket timeout error (640)
* 856db6d5619ad30368576b6ee55577d24e91e030 build: add mcr publishing
* c6e44f95d96d3adc403e21985404e8527cebd6bf fix: fix issue with socket timeout in advanced handler
* 2425b7adbb7cc5f5a0ae56b19c864ebcc7445dc4 fix: update detect anomaly suite to make anomaly more pronounced
* 07c7fecf78af53d56f66565dd9b5033019eb71b1 style: run markdown through markdown linter
* a0e85f5a98ce01c14a3cf3ffca856282a3029822 build: increase setup timeouts
* 5c190f8eecd158fe32a318325ddd9f8fb94eb15d style: Fix style issues
* 4bf6f712fa64d43af0efd759813faaae94cf37a5 build: Add build cancel timeouts
* 915d68334eaeac2ed2fa8022bb5b4b3a3dadb039 build: add release job to Azure Pipelines
* e48f9cbea3c446888cf2005c129f8ede9cf513db build: Add github version badges
* 73581cbf19558df899cc909cb7e1aee3d7e5c72e build: fix flaky codecov upload
* ce1e66d3b17ca035a71dae9148d3adce611e1c37 build: fix e2e notebook cluster check
* 19aeb8037e3589fb6dbd25fe5840b54b2378ed98 build: Add behavior bot
* 72ccae226876f57f71cb8ff8e388b34ce05b7031 build: Make task retry part of bash script
* 16dd7f4eb55d7fa740c83d776599fb94598e361c Update formatting
* 3fe4db5934552edd34cc9f025faec0c5b2526a64 adding vagrant doc and fixing indentation in vagrantfile
* d58d6f41909ecafa057a5327374c1825331f66ce Vowpal Wabbit on Spark
* 95dc73464714793997dffa8050451e1e50cae4dc adding vagrant file back in, updated for sbt (622)
* 605c98f914a51661eb868a9d83adeaac3b6e2e37 Add flaky test retry
* 4ebbb41a08e73f731d556d97cf76a2df52a75b42 remove brittle dataset downloading from demos
* e572a9aa584616d249652a23f8bc218e3b64ebe6 try to Fix codecov upload
* fac542e2f6f80e51d8c62b5886b5804cc7481873 Add codecov to python tests
* b6ba62f4c6ae6d2e9a1d0df7bd9c3bf4e1c4cc52 Add test publishing tobuild
* 5cada6f78fee649adf2e7c413684b431edc8be23 Increase coverage and remove dead code
* ae191a6cb777ee7dde9572ff1bdf80e366a29a70 Fix build summary
* e18ec2e9cdf2af07c40682b5c228fb876001e8d5 leverage codecov.io's coverage capabilities
* 8e7626332f5da8757a12d2614ffb27b87ff3746f Improve noisy neighbor problems for e2e tests
* 6ab8916cc236dfc81c2d9b4d912f2903248083b8 add codecov file
* 70881b2930321019c48b175e38ed9b7998bdf9d4 improve test coverage
* 41da2b7af2bace4ce0715b50a1db050cd67207e3 improve flakiness
* aa3c98f22f26ea6f02eebaeea2ffa5a8d8e42cfe improve coverage
* 237d38821e9dbf23d6d187aa33b0de106066a724 Add Code Coverage badge
* 7146b9bc2af6da655b2c3061d9cf7edfcfdc517d Add unit test timeout
* fa87e427996ac270a9763b844d62411c610d48e6 Fix noisy neighbor search index tests
* 0f98f7df3169e4e648c5d01ecc54173baf8d8f10 add codeowners file
* 43218097e2b787b4b9009074b20a042e20367292 add codeowners file
* 80aecab8321423fb20c2d5bbc23362d514180472 Add upload to codecov.io
* 66db39fbed3e9660b9cdbf90afb065db9ce581d5 Split LGBM tests for speed
* a6998ec6b0fe068f064ad9600fa204c349b932b0 Update README.md
* 027e6d72f5473b8d570ca40385aad4019b39d15c Remove unused code
* 0205b7e692b70433775617e8013f665642df791e Squash with partition fix
* dc1554f00e0ed2829e65d0414da847ad59094e45 Add r package upload
* 2fbd81cacfcf5eaf526ca4f9f7332446c88836fe Fix pipeline retry
* 0fde5941b96e2993576a2453748fdca6bb6cb878 attempt to fix partition consolidator flakiness
* 7940967acb21c6fc77a05537c6cbdeb9db55da42 Add codecov
* 7e8225f7e34f7efa5bc44aa0e6731ab087424725 fix retry logic
* d8c0eb49080193aaa5ca36d0b39c9e65b9a4056e Increase timeout for e2e notebook tests
* ff059a310ef48aa408d1c01909526880376947d8 Add ability to retry pipeline
* 8cf91cabb166796726de86e81f64f0734a23c25a Simplify build pipeline
* 5c8c9032986138964f0d9d0acb6533ce3b8b8004 Delete runme
* 210b522324e93824bcf6e81897c81eb31d87a9b4 Update CNTK code in README
* da6e4977c1a1eb93495ec23ca97de18e34e6369a Update pipeline.yaml for Azure Pipelines
* e94631885c63de61b33dda7229902469e7d6bc12 Add build status bar
* 37d36af2acf66a46a1c44eec4ae403543061064f Enable PR builds
* 6c56326c1a5d78460052f51150ccaf70fd3b1f4c transition to new build system
* fb3e99e53d46ef5536dd2fa765e25b3d7ded07d8 Update dockerfile
* 637df9d34f508cd1c83542a69e922bc342b1fe0d Update documentation for new build
* e9ef538cdf75de1e243a21fb4a46e473d5f138a0 Improve test robustness
* d34f9d173d6f5cb0fbaa93a078bc339c28618549 Remove unused build scripts
* 4034a4fc9eeef54fac4f3710fdc738a904e026a7 Add doc publishing to build
* 36d8c3bd53686e94a8a054faf3f2efd161aa85eb Fixup after rebase
* 7c5e7b676974c21486704e71a3fa793d08f25d1c Get e2e tests working
* 07316a8c7db982f7f7b9cf9bc6793001c8cf9dbd Fix serialization fuzzing error
* f6df90771e93a209c4a846c462141de494c379ed Make recomendation tests faster
* dd99937b6eb3c023d2955a91f58e7133ca4bf248 Add python tests
* 02a8ac6c46acd0261c5b6bafa8a7ab4a05b14949 Add publish task
* 3a526c8c6ac0720e15ca22a7e0faeb24cac08bb6 Fix Test Errors and Improve Reliability
* 4a696c5548be2e505411b39a64af2bc669640a96 Parallelize Tests
* 2b75b62b8bd50239564ff5d1f50a94b003881bd2 Make build windows compatible
* 94e9b218a4bc1d6fc9134987d583924f4a83b983 Add developer-readme.md
* 5659287842bc09710076efe5fc5af2dcc82229a4 Fix python testing
* 987c7c49b9e10f9c3aa20f47c69fb133067387c9 Get python codegen to work
* 90089fa36a41260f8366d7ecce0cc24c06081f47 Add scalastyle and unidoc
* 79d41102fae2dd6e20f4aeafd77bdc9336ad1a24 Add secrets
* 5742c0e164d54f3b87e2e9007c249d45944f61ec Refactor build
* 77d7cb4f3c7f0c5eaf46883980754f9149d5d851 Move library into a single package
* 29c15cb52055d2598f25bd2249a738d0f2261c3d add barrier execution mode
* aac05361c454e4a4d383ca4f551f3a4051f1b35c fix default value for double array param in codegen
* 2bd2faf1295c8ffae43c9f528e676ddb2f0909ba fix wrapper generator for ranker models
* 6885ef5ea42942b6e134a341cd9f6f008e20e156 added lightgbm ranker model pyspark api
* 08b308585eefeebffb48df5857be1579bc6c5364 fix summarize data columns
* 044d0b5698fd99d30c874e3328a6b24cbda55acc reduce memory usage, fix frozen jobs, add more debug logging
* 45c91f98c7ed425beefec23bcd436690e1540dd7 defer lightgbm probability calculation to native core to fix multiclass bug in some scenarios (578)
* 44735200184151e180a3188fa315fa15a7fd18fa squish runs together
* 00ebf64bb34148d1cdc17f6108f31d471ec279c4 use right python version
* 216abea6317115d4a168cd533c1212ac2063bff3 updated readme. more mini images
* 3232d848d8de65a23a77908213ee9667f2c3a7a5 Fix flakey test
* e9a612bb803a346e8b3d3cbfdd18cc8f36653d39 Fix Entity Detector Suite
* ba3dbd0ea6eb654beb130bc79b9527ac62c2ef0e Improve service concurrency
* 75819a51fe88a16126e71bcb8f3376a8d8c4837e Add simple Anamoly Detector
* 17a765e6747dca6ab0f28cce047c7068bd3c31f2 Add `is_provide_training_metric` to LightGBMRanker.
* ceb52918c125ad844cf27fb812f30e9bcb5077ac Print metrics of validation data as well.
* b54363c9f78308505a25d0826c989326312b2c9a Implement `is_provide_training_metric` in Scala codes through JNI.
* c7e31e61fb93f198128a5777a5c786cdb9d8458f fix query column to support long type
* 6a6d57f40ecd25a23efae29b2d18671647dbdb3f Poke Build System
* 11fe799a3e6142c0788ec5a314d83e2c4f8cb1ee Fixing Cog Service Test
* 6eba0b6f4d612a35e4464bd955859efdf45eb803 ignore flaky test
* 53c4b9e0fd917b91cd7fb195ebe44822cdd212ee adding LightGBMRanker
* fa7785734a54c5e45c98c66196846be3e4682dbf add init score column for continued training
* 32ac35348312e57599c9275fcdba800765efc638 Add anomaly detection and speech to text services
* 06273b252d753be61c353a15a2a20455c92e3af2 improved compute model statistics error message
* e7a309c3d9ea0462cfd055e2d794cae7dfbe5fca pass through slot names to native structure
* b295dae1a53c7fe127a498e974554f854b316075 add batch training support in lightgbm classifier and regressor

This list of changes was [auto generated](https://msazure.visualstudio.com/Cognitive%20Services/_build/results?buildId=24338255&view=logs).</details>

0.17

Highlights

- LightGBM evaluation 3-4x faster!
- Spark Serving v2
- LightGBM training supports early stopping and regularization
- LIME on Spark significantly faster

New Features

Spark Serving v2:

- Both Microbatch and Continuous mode have sub-millisecond latency
- Supports fault tolerance
- Can reply from anywhere in the pipeline
- Fail fast modes for warning callers of bad JSON parsing
- Fully based on DataSource API v2

LightGBM:

- 3-4x evaluation performance improvement
- Add early stopping capabilities
- Added L1 and L2 Regularization parameters
- Made network init more robust
- Fixed bug caused by empty partitions

LIME on Spark:

- LIME Parallelization significantly faster for large datasets
- Tabular Lime now supported

Other:

- Added UnicodeNormalizer for working with complex text
- Recognize Text exposes parameters for its polling handlers

Acknowledgements

We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

- Ilya Matiach, Markus Cozowicz, Scott Graham, Daniel Ciborowski, Jeremy Reynolds, Miguel Fierro, Robert Alexander, Tao Wu, Sudarshan Raghunathan, Anand Raman,Casey Hong, Karthik Rajendran, Dalitso Banda, Manon Knoertzer, Lars Ahlfors, The Microsoft AI Development Acceleration Program, Cognitive Search Team, Azure Search Team

0.16

New Features
- Added the `AzureSearchWriter` for integrating Spark with [Azure Search](https://azure.microsoft.com/en-us/services/search/)
- Added the [Smart Adaptive Recommender (SAR)](https://github.com/Azure/mmlspark/blob/master/docs/SAR.md) for better recommendations in SparkML
- Added [Named Entity Recognition Cognitive Service](https://azure.microsoft.com/en-us/services/cognitive-services/text-analytics/) on Spark
- Several new [LightGBM features](LightGBM-on-Spark) (Multiclass Classification, Windows Support, Class Balancing, Custom Boosting, etc.)
- Added Ranking Train Validation Splitter for easy ranking experiments
- All Computer Vision Services can now send binary data or URLs to Cognitive Services


New Examples
- Learn how to use the Azure Search writer to create a visual search system for The Metropolitan Museum of Art with: [AzureSearchIndex - Met Artworks.ipynb](https://github.com/Azure/mmlspark/blob/master/notebooks/samples/AzureSearchIndex%20-%20Met%20Artworks.ipynb)

Updates and Improvements

General
- MMLSpark Image Schema now unified with Spark Core
- Now supports Query pushdown and [Deep Learning Pipelines](https://github.com/databricks/spark-deep-learning)
- Bugfixes for Text Analytics services
- `PageSplitter` now propagates nulls
- HTTP on Spark now supports socket and read timeouts
- `HyperparamBuilder` python wrappers now return idiomatic python objects

LightGBM on Spark
- Added multiclass classification
- Added multiple types of boosting (Gradient Boosting Decision Tree, Random Forest, Dropout meet Multiple Additive Regression Trees, Gradient-based One-Side Sampling)
- Added windows OS support/bugfix
- LightGBM version bumped to `2.2.200`
- Added native support for categorical columns, either through Spark's StringIndexer, MMLSpark's ValueIndexer or list of indexes/slot names parameter
- `isUnbalance` parameter for unbalanced datasets
- Added boost from average parameter


Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

- Ilya Matiach, Casey Hong, Daniel Ciborowski, Karthik Rajendran, Dalitso Banda, Manon Knoertzer, Sudarshan Raghunathan, Anand Raman,Markus Cozowicz, The Microsoft AI Development Acceleration Program, Cognitive Search Team, Azure Search Team

0.15

New Features
- Add the `TagImage` and `DescribeImage` services
- Add Ranking Cross Validator and Evaluator

New Examples
- Learn how to use HTTP on Spark to work with arbitrary web services at scale in [HttpOnSpark - Working with Arbitrary Web APIs.ipynb](https://github.com/Azure/mmlspark/blob/master/notebooks/samples/HttpOnSpark%20-%20Working%20with%20Arbitrary%20Web%20APIs.ipynb)

Updates and Improvements

LightGBM
- Fix issue with `raw2probabilityInPlace`
- Add weight column
- Add `getModel` API to `TrainClassifier` and `TrainRegressor`
- Improve robustness of getting executor cores

HTTP on Spark and Spark Serving
- Improve robustness of Gateway creation and management
- Imrpove Gateway documentation

Version Bumps
- Updated to Spark 2.4.0
- LightGBM version update to 2.1.250

Misc
- Fix Flaky Tests
- Remove autogeneration of scalastyle
- Increase training dataset size in snow leopard example

Acknowledgements
We would like to acknowledge the developers and contributors, both internal and external who helped create this version of MMLSpark.

- Ilya Matiach, Casey Hong, Karthik Rajendran, Daniel Ciborowski, Sebastien Thomas, Eli Barzilay, Sudarshan Raghunathan, flybywind, wentongxin, haal

Page 1 of 3

© 2025 Safety CLI Cybersecurity Inc. All Rights Reserved.