This is a major new release of SKLL.
What's new
- Several new scikit-learn learners included along with reasonable default parameter grids for tuning, where appropriate (issues 256 & 375, PR 377).
- `BayesianRidge`
- `DummyRegressor`
- `HuberRegressors`
- `Lars`
- `MLPRegressor`
- `RANSACRegressor`
- `TheilSenRegressor`
- `DummyClassifier`
- `MLPClassifier`
- `RidgeClassifier`
- Allow computing any number of additional evaluation metrics in addition to the tuning objective (issue 350, PR 384).
- Rename `cv_folds_file` configuration option to `folds_file`. The former is still supported with a deprecation warning but will be removed in the next release (PR 367).
- Add a new configuration option [`use_folds_file_for_grid_search`](http://skll.readthedocs.io/en/latest/run_experiment.html#use-folds-file-for-grid-search-optional) which controls whether the inner-loop grid-search in a cross-validation experiment with a custom folds file also uses the folds from the file. It's set to True by default. Setting it to False means that the inner loop uses regular 3-fold cross-validation and ignores the file (PR 367).
- Also add a keyword argument called `use_custom_folds_for_grid_search` to the `Learner.cross_validate()` method (PR 367).
- Learning curves can now be plotted from existing summary files using the new [`plot_learning_curves`](http://skll.readthedocs.io/en/latest/utilities.html#plot-learning-curves) command line utility (issue 346, PR 396).
- Overhaul logging in SKLL. All messages are now logged both to the console (if running interactively) and to log files. Read more about the SKLL log files in the [Output Files section](http://skll.readthedocs.io/en/latest/run_experiment.html#output-files) of the documentation (issue 369, PR 380).
- `neg_log_loss` is now available as an objective function for classification (issue 327, PR 392).
Changes
- SKLL now supports Python 3.6. Although Python 3.4 and 3.5 will still work, 3.6 is now the officially supported Python 3 version. Python 2.7 is still supported. (issue 355, PR 360).
- The required version of scikit-learn has been bumped up to 0.19.1 (issue 328, PR 330).
- The learning curve y-limits are now computed a bit more intelligently (issue 389, PR 390).
- Raise a warning if ablation flag is used for an experiment that uses `train_file`/`test_file` - this is not supported (issue 313, PR 392).
- Raise a warning if both `fixed_parameters` and `param_grids` are specified (issue 185, PR 297).
- Disable grid search if no default parameter grids are available in SKLL and the user doesn't provide parameter grids either (issue 376, PR 378).
- SKLL has a copy of scikit-learn's `DictVectorizer` because it needs some custom functionality. _Most_ (but not all) of our modifications have now been merged into scikit-learn so our custom version is now significantly condensed down to just a single method (issue 263, PR 374).
- Improved outputs for cross-validation tasks (issues 349 & 371, PRs 365 & 372)
- When a folds file is specified, the log erroneously showed the full dictionary.
- Show number of cross-validation folds in results to be <n> via folds file if a folds file is specified.
- Show grid search folds in results to be <n> via folds file if the grid search ends up using the folds file.
- Do not show the stratified folds information in results when a folds file is specified.
- Show the value of `use_folds_file_for_grid_search` in results when appropriate.
- Show grid search related information in results only when we are actually doing grid search.
- The Travis CI plan was broken up into multiple jobs in order to get around the 50 minute limit (issue 385, PR 387).
- For the conda package, some of the dependencies are now sourced from the `conda-forge` channel.
Bugfixes
- Fix the bug that was causing the inner grid-search loop of a cross-validation experiment to use a single job instead of the number specified via `grid_search_jobs` (issue 363, PR 367).
- Fix unbound variable in `readers.py` (issue 340, PR 392).
- Fix bug when running a learning curve experiment via `gridmap` (issue 386, PR 390).
- Fix a mismatch between the default number of grid search folds and the default number of slots requested via `gridmap` (issue 342, PR 367).
Documentation
- Update documentation and tests for all of the above changes and new features.
- Update tutorial and installation instructions (issues 383 and 394, PR 399).
- Standardize all of the function and method docstrings to be NumPy style. Add docstrings where missing (issue 373, PR 397).