Skll

Latest version: v5.0.1

Safety actively analyzes 689579 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 4 of 12

1.1.0

The biggest changes in this release are that the required version of scikit-learn has been bumped up to 0.16.1 and config file parsing is much more robust and gives much better error messages when users make mistakes.

Implemented enhancements
- Base estimators other than the defaults are now supported for `AdaBoost` classifiers and regressors (238)
- User can now specify number of cross-validation folds to use in the config file (222)
- Decision Trees and Random Forests no longer need dense inputs (207)
- Stratification during cross-validation is now optional (160)

Fixed bugs
- Bug when checking if `hasher_features` is a valid option (234)
- Invalid/missing/duplicate options in configuration are now detected (223)
- Stop modifying global numpy random seed (220)
- Relative paths specified in the config file are now relative to the config file location instead of to the current directory (213)

Closed issues
- Incompatibility with the latest version of scikit-learn (v0.16.1) (235, 241, 233)
- Learner.model_params will return weights with the wrong sign if sklearn is fixed (111)

Merged pull requests
- Overhaul configuration file parsing (desilinguist, 246)
- Several minor bugfixes (desilinguist, 245)
- Compatibility with scikit-learn v0.16.1 (desilinguist, 243)
- Expose cv_folds and stratified (aoifecahill, 240)
- Adding Report tests (brianray, 237)

[Full Changelog](https://github.com/EducationalTestingService/skll/compare/v1.0.1...v1.1.0)

1.0.1

This is a fairly minor bugfix release. Changes include:
- Update links in README.
- Fix crash when trying to run experiments with integer labels (Issue 225, PR 219)
- Update documentation about ablation to note that there will always be a run with all features (Issue 224, PR 226)
- Update documentation about format of `cv_folds_file` (Issue 225, PR 228)
- Remove duplicate words in documentation (PR 218)
- Fixed `KeyError` when trying to build conda recipe.
- Update outdated parameter grids in `run_experiment` documentation (commit 80d78e4)

1.0.0

The 1.0 release is finally here! It's been a little over a year since our first public release, and we're ready to say that SKLL is 1.0. Read our massive release notes:

:warning: We did make some API- and config-file-breaking changes. They are listed at the end of the release notes. They should all be addressable by a quick find-and-replace.

Bug fixes
- Fixed path problems in iris example (issue 103, PR 171)
- Fixed bug where `ablated_features` field was incorrect when config file contained multiple feature sets (issue 125)
- Fixed bug where CV would crash with rare classes (issue 109, PR 165)
- Fixed issue where warning about extremely large feature values was being issued before rescaling
- Fixed issue where some warning messages used mix of new-style and old-style replacement strings with old-style formatting.
- Fixed a number of bugs with filtering `FeatureSet` objects and writing filtered sets to files.
- Fixed bug in `FeatureSet.__sub__` where feature names were being passed instead of indices.
- Fixed issue where `MegaMWriter` could not print numbers in Python 2.7.

New features
- SKLL releases are now for specific versions of scikit-learn. 1.0.0 requires scikit-learn 0.15.2 (issue 138, PR 170)
- Added [tutorial](https://skll.readthedocs.org/en/master/tutorial.html) to documentation that walks new users through using SKLL in much the same way as our PyData talks (issue #153).
- Added support for custom learners (issue 92, PR 183)
- Added two command-line utilities, `join_features` and `filter_features`, for joining and filtering feature files. These replace `join_megam` and `filter_megam` (issue 79, PR 198)
- Added support for specifying the field in ARFF, CSV, or TSV files that contains the IDs for each instance (issue 204, PR 206)
- Added train/test set sizes to result files (issue 150, PR 161)
- Added intercept to `print_model_weights` output (issue 155, PR 163)
- Added total time and end time-stamp to experiment results (issue 91, PR 167)
- Added exception when `featureset_name` is longer than 210 characters (issue 121, PR 168)
- Added regression example data, `boston` (issue 162)
- Added ability to specify number of grid search folds (issue 122, PR 175)
- Added warning message when number of features in training model are different than those for FeatureSet passed to `Learner.predict()` (issue 145)
- Added `conda.yaml` file to repository to make conda package creation simpler (issue 159, PR 173)
- Added loads more unit tests, greatly increased unit test coverage, and generally cleaned up test modules (issues 97, 148, 157, 188, and 202; PRs 176, 184, 196, 203, and 205)
- Added `train_file` and `test_file` fields to config files, which can be used to specify single file feature sets. This greatly simplifies running simple experiments (issue 12, PR 197)
- Added support for merging feature sets with IDs in different orders (issue 149, PR 177)
- Added `ValueError` when invalid tuning objective is specified (issues 117 and 179; PRs 174 and 181)
- Added `shuffle` option to config files to decide whether training data should be shuffled before training. By default this is `False`, but if `grid_search` is `True`, we will automatically `shuffle`. Previously, the default was `True`, and there was no option in the config files. (issue 189, PR 190)
- Updated documentation to indicate that we're using `StratifiedKFold` (issue 160)
- Added `FeatureSet.__eq__` and `FeatureSet.__getitem__` methods.

Minor changes without issues
- Overhauled and cleaned up all documentation. [Look](https://skll.readthedocs.org) how pretty it is!
- Updated docstrings all over the place to be more accurate.
- Updated `generate_predictions` to use new `Reader` API.
- Added `argv` optional argument to all utility script `main` functions to simplify testing.
- Added `mock` tests, so SKLL now requires `mock` to work with Python 2.7.
- Added prettier SVG badges to README.
- Added link to Data Science at the Command Line to README.
- `LibSVMReader` now converts UTF-8 replacement characters that are used by `LibSVMWriter` when a feature name contains an `=`, `|`, ``, `:`, or ` ` back to the original ASCII characters.

:warning: API breaking changes :warning:
- `FeatureSetWriter` :arrow_right: `Writer`
- `load_examples(path)` :arrow_right: `Reader.for_path(path).read()`
- `write_feature_file(...)` :arrow_right: `Writer.for_path(FeatureSet(...)).write()`
- `FeatureSet.classes` :arrow_right: `FeatureSet.labels`
- All other instances of word "classes" changed to "labels" (166)
- `FeatureSet.feat_vectorizer` :arrow_right: `FeatureSet.vectorizer`
- `run_ablation(all_combos=True)` :arrow_right: `run_configuration(ablation=None)`
- `run_ablation()` :arrow_right: `run_configuration(ablation=1)`
- `ExamplesTuple(ids, classes, features, vectorizer)` :arrow_right: `FeatureSet(name, ids, classes, features, vectorizer)`
- Removed `feature_hasher` argument to all `Learner` methods, because its unnecessary
- `Learner.model_type` is now the actual type of the underlying model instead of just a string.
- `FeatureSet.__len__` now returns the number of examples instead of the number of features.
- Removed `skll.learner._REGRESSION_MODELS` and now we check for regression by seeing if model is subclass of `RegressorMixin`.

:warning: Config file breaking changes :warning:
- Removed all short names for learners (PR 199)
- Can no longer use `classifiers` instead of `learners`
- `train_location` :arrow_right: `train_directory`
- `test_location` :arrow_right: `train_directory`
- `cv_folds_location` :arrow_right: `cv_folds_file`

0.28.1

Bug fix release that fixes issue where `python setup.py install` would not work because the `skll.data` packages wasn't include in the list of packages.

0.28.0

This release has some big behind-the-scenes changes. First, we split the `data.py` module up into a sub-package (147). There is also a new `FeatureSet` class that replaces the old `namedtuple`-based `ExamplesTuple` (81), so `ExamplesTuple` is now deprecated and will be removed in SKLL 1.0.0.

Speaking of which, we're having an all-day SKLL sprint on the October 17th where we hope to resolve all the remaining issues preventing the 1.0 release.

Other changes include:
- Fixed a bunch of minor problems with loading/writing LibSVM files
- Added file reading/writing progress indicators
- Fixed crash with `generate_predictions` when the model was not trained with `probability` set to True (144).
- Deprecated `write_feature_file` function in favor of using a `FeatureSetWriter` object.
- Deprecated `load_examples` function in favor of using a `Reader` object.
- Temporarily added replacement version of scikit-learn `DictVectorizer` class until scikit-learn/scikit-learn3683 version is included in a release. This allows us to make file loading substantially more memory efficient.

0.27.0

The main new feature in this release is that `.libsvm` files are now fully supported by `skll_convert` and `run_experiment`. Because of this change, we've removed `megam_to_libsvm`.

Other changes include:
- Integer keys are now allowed in `fixed_parameters` and `param_grids` (134). Therefore, SKLL now requires PyYAML to function properly.
- Added documentation about using `class_weights` to manage imbalanced datasets (132)
- Added information about pre-specified folds (via `cv_folds_location) to results JSON and plain-text files. (108)
- Added warning when encountering classes that are not in `class_map`. (114)
- Fixed issue where sampler `random_state` parameter would be overridden.
- Fixed license headers in CLI package. They were still GPL for some reason.
- Fixed issue 112 by switching to `joblib.pool.MemmappingPool` for handling parallel file loading. SKLL now requires joblib 0.8 to function properly.
- Fixed issue 104 by making result formatting more consistent.
- `compute_eval_from_predictions` now supports string-valued classes, as it should have. (135)
- We now raise an exception instead of allowing you to overwrite your results by including the same learner in the `learners` list in your config file twice (140).
- Fixed warning about files being left open in Python 3.4 (by not leaving them open anymore).
- Short names for learners have been deprecated and will be removed in SKLL 1.0.

Page 4 of 12

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.