Features
- Add advanced option `predict_single_probability_for_binary_classification`
to generate prediction tensors of shape [batch_size, 2] for binary
classification model.
- Add support for weighted training.
- Add support for permutation variable importance in the GBT learner with the
`compute_permutation_variable_importance` parameter.
- Support for tf.int8 and tf.int16 values.
- Support for distributed gradient boosted trees learning. Currently, the TF
ParameterServerStrategy distribution strategy is only available in
monolithic TF-DF builds. The Yggdrasil Decision Forest GRPC distribute
strategy can be used instead.
- Support for training from dataset stored on disk in CSV and RecordIO format
(instead of creating a tensorflow dataset). This option is currently more
efficient for distributed training (until the ParameterServerStrategy
support per-worker datasets).
- Add `max_vocab_count` argument to the model constructor. The existing
`max_vocab_count` argument in `FeatureUsage` objects take precedence.
Fixes
- Missing filtering of unique values in the categorical-set training feature
accumulator. Was responsible for a small (e.g. ~0.5% on SST2 dataset) drop
of accuracy compared to the C++ API.
- Fix broken support for `max_vocab_count` in a `FeatureUsage` with type
`CATEGORICAL_SET`.