Release Notes
Breaking Changes
+ `cftool.param_utils` has been moved to `cftool.ml.param_utils`
+ `cftool.ml.param_utils.ParamsGenerator` only supports `dict` input now
+ Gradient Descent framework has been moved to `cftool.optim.gd`
+ `cftool.ml.utils.Grid` has been moved to `cftool.misc.Grid`
+ `cftool.ml.utils.Metrics.score` has been renamed to `.metric`
`cftool.ml.hpo`
We're happy to announce that a general **Hyper-Parameter Optimization** (`HPO`) framework has been implemented, including:
+ grid search
+ random search
+ bayesian optimization (which leverages `cftool.optim.bo` mentioned below)
In order to utilize `HPO`, we need to define a model first. For example, a *LinearRegression* model based on Gradient Descent:
> The full example could be found in `tests/usages/test_hpo.py`
python
import numpy as np
from typing import *
from cftool.ml.param_utils import *
from cftool.optim.gd import GradientDescentMixin
class LinearRegression(GradientDescentMixin):
def __init__(self, dim, lr, epoch):
self.w = np.random.random([dim, 1])
self.b = np.random.random([1])
self._lr, self._epoch = lr, epoch
property
def parameter_names(self) -> List[str]:
return ["w", "b"]
def loss_function(self,
x_batch: np.ndarray,
y_batch: np.ndarray,
batch_indices: np.ndarray) -> Dict[str, Any]:
predictions = self.predict(x_batch)
diff = predictions - y_batch
return {
"diff": diff,
"loss": np.abs(diff).mean().item()
}
def gradient_function(self,
x_batch: np.ndarray,
y_batch: np.ndarray,
batch_indices: np.ndarray,
loss_dict: Dict[str, Any]) -> Dict[str, np.ndarray]:
diff = loss_dict["diff"]
sign = np.sign(diff)
return {
"w": (sign * x_batch).mean(0, keepdims=True).T,
"b": sign.mean(0)
}
def fit(self, x, y):
self.setup_optimizer("adam", self._lr, epoch=self._epoch)
self.gradient_descent(x, y)
return self
def predict(self, x):
return x.dot(self.w) + self.b
After that, we need to define `pattern_creator` which will create a `ModelPattern`, the common interface used in `cftool.ml`:
python
from cftool.ml import *
def pattern_creator(features, labels, param):
model = LinearRegression(dim_, **param)
model.show_tqdm = False
model.fit(features, labels)
return ModelPattern(init_method=lambda: model)
And also the parameter space we want to search from:
python
params = {
"lr": Float(Exponential(1e-5, 0.1)),
"epoch": Int(Choice(values=[2, 20, 200])),
}
After these, running `HPO` is pretty straight forward in `cftool`:
python
dim_ = 10
w_true = np.random.random([dim_, 1])
b_true = np.random.random([1])
x_ = np.random.random([1000, dim_])
y_ = x_.dot(w_true) + b_true
estimators = list(map(Estimator, ["mae", "mse"]))
You can use Bayesian Optimization by replacing 'naive' with 'bo'
hpo = HPOBase.make("naive", pattern_creator, params)
hpo.search(
x_, y_, estimators,
num_jobs=1, use_tqdm=True, verbose_level=1
)
`cftool.optim.bo`
We're also happy to announce that a general **Bayesian Optimization** (`BO`) framework has been implemented.
Example
python
from cftool.ml.param_utils import *
from cftool.optim.bo import BayesianOptimization
float_uniform = Float(Uniform(-10, 10))
string_choice = String(Choice(values=["a", "b", "c"]))
params = {
"x1": Iterable([float_uniform, float_uniform]),
"x2": Iterable([string_choice, string_choice])
}
def fn(p):
x1, x2 = p["x1"], p["x2"]
r1 = -(x1[0] + 2 * x1[1] - 7) ** 2 - (2 * x1[0] + x1[1] - 5) ** 2
r2 = (x2[0] == "b") + (x2[1] == "c")
return r1 + 10. * r2
Ground Truth is [ [1, 3], ["b", "c"] ]
bo = BayesianOptimization(fn, params).maximize()
print(bo.best_result)
bo.maximize()
print(bo.best_result)
bo.maximize()
print(bo.best_result)
`cftool.ml`
`EnsemblePattern`
Util class to create an interface for users to leverage `Comparer` & `HPO` in an ensembled way.
Parameters
+ model_patterns : List[ModelPattern], list of `ModelPattern` we want to ensemble from.
+ ensemble_method : Union[str, collate_fn_type], ensemble method we use to collate the results.
+ If str, then `EnsemblePattern` will use `getattr` to get the collate function.
+ Currently only 'default' is supported, which implements voting for classification and averaging for regression.
+ If collate_fn_type, then `EnsemblePattern` will use it to collate the results directly.
Examples
python
import numpy as np
from cftool.ml.utils import ModelPattern, EnsemblePattern
x, y = map(np.atleast_2d, [[1., 2., 3.], [0., 2., 1.]])
identical = lambda x_: x_
minus_one = lambda x_: x_ - 1
identical_pattern = ModelPattern(predict_method=identical)
minus_one_pattern = ModelPattern(predict_method=minus_one)
Averaging 'identical' & 'minus_one' -> 'minus_0.5'
ensemble = EnsemblePattern([identical_pattern, minus_one_pattern])