gordo-components Changelog

0.29.0

- Make client auto choose prediction endpoint (https://github.com/equinor/gordo-components/commit/66901600183c50a60ae22d65f56189291255384d)
- Add **kwargs to sub_providers (iroc and ncs_reader) (https://github.com/equinor/gordo-components/commit/63c53ef9eade25732f683f757adb9a3ee5ce193b)
- Remove explicit threads=None argument from DataLakeProvider __init__ (https://github.com/equinor/gordo-components/commit/697a5b2d84f6d981eca8d993481f6de113576efe)
- Fix overwrite of y in KerasLSTMAutoEncoder.fit (https://github.com/equinor/gordo-components/commit/dd390e16332d0c84ce0218e9c178aa9e649b50eb)

0.28.0

New release of gordo-components!

Small changes:
- All dependencies are updated, including pandas (0.24.2->0.25.0)
- Fix issue where IROC reader used 1 thread by default. (409)
- Add exponential retries to influx forwarder (413)
- Filter bad data (code 0) from the datalake (423)
- Wrapper enabling use of standard scikit-learn scorers (427)

Major change:
Change all our keras neural networks to take an explicit `y` instead of using the passed (and possibly scaled) `X` as the target.
This gives more freedom in several ways:
- It allows training towards a un-scaled `y` with a scaled `X`, or having them xscaled in different ways.
- It allows the `y` and `X` to be different sets of tags. The target `y` can be a subset of `X` or even a completely different set of tags.
- It follows the standard scikit-learn pattern, making it easier to use e.g. standard scikit-learn scorers. (more about this below)

But it also involves some changes in the model definitions to get the same behavior as before.

Change in model format:

Previous model definition:
yaml
model:
sklearn.pipeline.Pipeline:
steps:
- sklearn.preprocessing.data.MinMaxScaler
- gordo_components.model.models.KerasLSTMAutoEncoder:
kind: lstm_hourglass
lookback_window: 10

New model definition:
yaml
model:
gordo_components.model.anomaly.diff.DiffBasedAnomalyDetector:
base_estimator:
sklearn.compose.TransformedTargetRegressor:
transformer: sklearn.preprocessing.data.MinMaxScaler
regressor:
sklearn.pipeline.Pipeline:
steps:
- sklearn.preprocessing.data.MinMaxScaler
- gordo_components.model.models.KerasLSTMAutoEncoder:
kind: lstm_hourglass
lookback_window: 10

Explanation:
The first class, `gordo_components.model.anomaly.diff.DiffBasedAnomalyDetector` is a class which takes a base estimator as a parameter, and provides a new method `anomaly` in addition to any methods the `base_estimator` already has (like `fit` and `predict`). In the case of `DiffBasedAnomalyDetector` the call to `anomaly(X,y)` is implemented by calling `predict` on the `base_estimator`, scaling the output, scaling the passed `y`, calculating the absolute value of the differences, and then calculating the norm. The output of `anomaly(X,y)` is a multi-level dataframe with the original input and output to the base-estimator, in addition to per-sensor calculated errors (abs of differences) and the complete error score. The major difference from before is that the error-calculations are now an explicit class which can be used in e.g. notebooks, instead of existing as a function in the server-class.

The second new class in the config above is `sklearn.compose.TransformedTargetRegressor`. This is a standard scikit-learn class which allows one to scale the target `y` before the model is fitted, and then inverse scales the output of the `base_estimator` when `predict` is called. This class is needed if you want the Keras network to train towards scaled `y` as it was before, if you do not want this then you can omit the `sklearn.compose.TransformedTargetRegressor`.

Using scikit learn scores
It is now possible to use standard scikit-learn scorers with a simple wrapper.
Example:

python

from gordo_components import serializer
import yaml
import numpy
from sklearn.metrics import r2_score

config = yaml.load(
"""
sklearn.pipeline.Pipeline:
steps:
- sklearn.preprocessing.data.MinMaxScaler
- gordo_components.model.models.KerasLSTMAutoEncoder:
kind: lstm_hourglass
lookback_window: 10
epochs: 20
"""
)
model = serializer.pipeline_from_definition(config)

X = numpy.random.rand(100,10)
y = numpy.random.rand(100,10)

model.fit(X,y)
This will fail since the output and the target is of different length

r2_score(X,model.predict(X))

The fix:
from gordo_components.model.utils import metric_wrapper
metric_wrapper(r2_score)(X, model.predict(X))

0.27.0

- Watchman: Handle empty events gracefully (381)
- Support custom aggregation methods in TimeSeriesDataset (369)

0.26.1

- Generalize IROC reader (https://github.com/equinor/gordo-components/pull/375)
- Fix implicit changing of columns in the server base view which affected Grafana tag to data assignments (https://github.com/equinor/gordo-components/pull/380)

0.26.0

* Pass keyword arguments onto Keras compile, allowing more flexibility
* Add Gullfaks A as new asset
* Add "infinity" imputer
* Add pushing of "latest" tag for docker images, making it easier to always test latest build of master
* Optimize ML server post data processing, speeding it up
* Add pytest-benchmark

0.25.0

- Change default keras activation functions to tanh (346)
- Server: Log timings and return as header (345)
- Added PERA (Peregrino) as new asset
- Allow TimeseriesDataset to take and output target tags (327)
- Support sklearn.multioutput.MultiOutputRegressor (321)
- Add output activation function for feedforward NN as a parameter (352)

Gordo-components

Page 3 of 5