New functionality:
* Similar to Spark's `StringIndexer`, we have a `ValueIndexer` that can
be used for indexing any type of values instead of only strings. Not
only can it index these values, we also provide a reverse mapping via
`IndexToValue`, similar to Spark's `IndexToString` transform.
* A new "clean missing" data estimator, example:
val cmd = new CleanMissingData()
.setInputCols(Array("some-column"))
.setOutputCols(Array("some-column"))
.setCleaningMode(CleanMissingData.customOpt)
.setCustomValue(someCustomValue)
val cmdModel = cmd.fit(dataset)
val result = cmdModel.transform(dataset)
* New default featurization for date and timestamp spark types and our
internal image type. For featurization of date columns, convert
column to double features: year, day of week, month, day of month.
For featurization of timestamp columns, same as date and in addition:
hour of day, minute of hour, second of minute. For featurization of
image columns, use image data converted to double with width and
height info.
* Starting the docker image without an `ACCEPT_EULA` variable setting
would throw an error. Instead, we now start a tiny web server that
shows the EULA and replaces itself with the Jupyter interface when you
click the `AGREE` button.
Breaking changes:
* Renamed `ImageTransform` to `ImageTransformer`.
Notable bug fixes and other changes:
* Improved sample notebooks, and a new one: "303 - Transfer Learning by
DNN Featurization - Airplane or Automobile".
* Fix serialization bugs in generated python `PipelineStage`s.
Acknowledgments
Thanks to Ali Zaidi for some notebook beautifications.