Mmlspark

Latest version: v0.0.11111111

Safety actively analyzes 714792 Python packages for vulnerabilities to keep your Python projects secure.

Page 3 of 3

0.7

New functionality:

* New transforms: `EnsembleByKey`, `Cacher` `Timer`; see the documentation.

Updates:

* Miniconda version 4.3.21, including Python 3.6.

* CNTK version 2.1, using Maven Central.

* Use OpenCV from the OpenPnP project from Maven Central.

Improvements:

* Spark's `binaryFiles` function had a regression in version 2.1 from
version 2.0 which would lead to performance issues; work around that
for now. Data frame operations after a use of `BinaryFileReader` (eg,
reading images) are significantly faster with this.

* The Spark installation is now patched with `hadoop-azure` and
`azure-storage`.

* Includes additional bug fixes and improvements.

0.6

New functionality:

* Similar to Spark's `StringIndexer`, we have a `ValueIndexer` that can
be used for indexing any type of values instead of only strings. Not
only can it index these values, we also provide a reverse mapping via
`IndexToValue`, similar to Spark's `IndexToString` transform.

* A new "clean missing" data estimator, example:

val cmd = new CleanMissingData()
.setInputCols(Array("some-column"))
.setOutputCols(Array("some-column"))
.setCleaningMode(CleanMissingData.customOpt)
.setCustomValue(someCustomValue)
val cmdModel = cmd.fit(dataset)
val result = cmdModel.transform(dataset)

* New default featurization for date and timestamp spark types and our
internal image type. For featurization of date columns, convert
column to double features: year, day of week, month, day of month.
For featurization of timestamp columns, same as date and in addition:
hour of day, minute of hour, second of minute. For featurization of
image columns, use image data converted to double with width and
height info.

* Starting the docker image without an `ACCEPT_EULA` variable setting
would throw an error. Instead, we now start a tiny web server that
shows the EULA and replaces itself with the Jupyter interface when you
click the `AGREE` button.

Breaking changes:

* Renamed `ImageTransform` to `ImageTransformer`.

Notable bug fixes and other changes:

* Improved sample notebooks, and a new one: "303 - Transfer Learning by
DNN Featurization - Airplane or Automobile".

* Fix serialization bugs in generated python `PipelineStage`s.

Acknowledgments

Thanks to Ali Zaidi for some notebook beautifications.

Mmlspark

Page 3 of 3

0.7

0.6

0.5

Page 3 of 3

Links

Releases