Formulaic

Latest version: v1.0.1

Safety actively analyzes 642283 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 4

0.4.0

This is a major new release with some new features, greatly improved ergonomics
for structured formulae, matrices and specs, and a few small breaking changes
(most with backward compatibility shims). All users are encouraged to upgrade.

**Breaking changes:**

* `include_intercept` is no longer an argument to `FormulaParser.get_terms`;
and is instead an argument of the `DefaultFormulaParser` constructor. If you
want to modify the `include_intercept` behaviour, please use:
python
Formula("y ~ x", _parser=DefaultFormulaParser(include_intercept=False))

* Accessing terms via `Formula.terms` is deprecated since `Formula` became a
subclass of `Structured[List[Terms]]`. You can directly iterate over, and/or
access nested structure on the `Formula` instance itself. `Formula.terms`
has a deprecated property which will return a reference to itself in order to
support legacy use-cases. This will be removed in 1.0.0.
* `ModelSpec.feature_names` and `ModelSpec.feature_columns` are deprecated in
favour of `ModelSpec.column_names` and `ModelSpec.column_indices`. Deprecated
properties remain in-place to support legacy use-cases. These will be removed
in 1.0.0.

**New features and enhancements:**

* Structured formulae (and their derived matrices and specs) are now mutable.
Internally `Formula` has been refactored as a subclass of
`Structured[List[Terms]]`, and can be incrementally built and modified. The
matrix and spec outputs now have explicit subclasses of `Structured`
(`ModelMatrices` and `ModelSpecs` respectively) to expose convenience methods
that allow these objects to be largely used interchangeably with their
singular counterparts.
* `ModelMatrices` and `ModelSpecs` arenow surfaced as top-level exports of the
`formulaic` module.
* `Structured` (and its subclasses) gained improved integration of nested tuple
structure, as well as support for flattened iteration, explicit mapping
output types, and lots of cleanups.
* `ModelSpec` was made into a dataclass, and gained several new
properties/methods to support better introspection and mutation of the model
spec.
* `FormulaParser` was renamed `DefaultFormulaParser`, and made a subclass of the
new formula parser interface `FormulaParser`. In this process
`include_intercept` was removed from the API, and made an instance attribute
of the default parser implementation.

**Bugfixes and cleanups:**

* Fixed AST evaluation for large formulae that caused the evaluation to hit the
recursion limit.
* Fixed sparse categorical encoding when the dataframe index is not the standard
range index.
* Fixed a bug in the linear constraints parser when more than two constraints
were specified in a comma-separated string.
* Avoid implicit changing of the sparsity structure of CSC matrices.
* If manually constructed `ModelSpec`s are provided by the user during
materialization, they are updated to reflect the output-type chosen by the
user, as well as whether to ensure full rank/etc.
* Allowed use of older pandas versions. All versions >=1.0.0 are now supported.
* Various linting cleanups as `pylint` was added to the CI testing.

**Documentation:**

* Apart from the `.materializer` submodule, most code now has inline
documentation and annotations.

0.3.4

This is a backward compatible major release that adds several new features.

**New features and enhancements:**

* Added support for customizing the contrasts generated for categorical
features, including treatment, sum, deviation, helmert and custom contrasts.
* Added support for the generation of linear constraints for `ModelMatrix`
instances (see `ModelMatrix.model_spec.get_linear_constraints`).
* Added support for passing `ModelMatrix`, `ModelSpec` and other formula-like
objects to the `model_matrix` sugar method so that pre-processed formulae can
be used.
* Improved the way tokens are manipulated for the right-hand-side intercept and
substitutions of `0` with `-1` to avoid substitutions in quoted contexts.

**Bugfixes and cleanups:**

* Fixed variable sanitization during evaluation, allowing variables with
special characters to be used in Python transforms; for example:
bs(`my|feature%is^cool`).
* Fixed the parsing of dictionaries and sets within python expressions in the
formula; for example: `C(x, {"a": [1,2,3]})`.
* Bumped requirement on `astor` to >=0.8 to fix issues with ast-generation in
Python 3.8+ when numerical constants are present in the parsed python
expression (e.g. "bs(x, df=10)").

0.3.3

This is a minor patch release that migrates the package tooling to [poetry](https://python-poetry.org/); solving a version inconsistency when packaging for conda.

0.3.2

This is a minor patch release that fixes an attempt to import `numpy.typing` when numpy is not version 1.20 or later. (thanks for noticing this and fixing it bashtage ).

0.3.1

This is a minor patch release that fixes the maintaining of output types, NA-handling, and assurance of full-rank for factors that evaluate to pre-encoded columns when constructing a model matrix from a pre-defined `ModelSpec`. The benchmarks were also updated.

0.3.0

This is a relatively massive release bringing lots of new features and bugfixes. All users are encouraged to upgrade. The full changelog can be viewed in the [documentation website](https://matthewwardrop.github.io/formulaic/changelog/). The highlights include:

**Breaking changes:**

* The minimum supported version of Python is now 3.7 (up from 3.6).
* Moved transform implementations from `formulaic.materializers.transforms` to
the top-level `formulaic.transforms` module, and ported all existing
transforms to output `FactorValues` types rather than dictionaries.
`FactorValues` is an object proxy that allows output types like
`pandas.DataFrame`s to be used as they normally would, with some additional
metadata for formulaic accessible via the `__formulaic_metadata__`
attribute. This makes non-formula direct usage of these transforms much more
pleasant.
* `~` is no longer a generic formula separator, and can only be used once in a
formula. Please use the newly added `|` operator to separate a formula into
multiple parts.

**New features and enhancements:**

* Added support for "structured" formulas, and updated the `~` operator to use
them. Structured formulas can have named substructures, for example: `lhs`
and `rhs` for the `~` operator. The representation of formulas has been
updated to show this structure.
* Added support for context-sensitivity during the resolution of operators,
allowing more flexible operators to be implemented (this is exploited by the
`|` operator which splits formulas into multiple parts).
* The `formulaic.model_matrix` syntactic sugar function now accepts `ModelSpec`
and `ModelMatrix` instances as the "formula" spec, making generation of
matrices with the same form as previously generated matrices more
convenient.
* Added the `poly` transform (compatible with R and patsy).
* `numpy` is now always available in formulas via `np`, allowing formulas like
`np.sum(x)`. For convenience, `log`, `log10`, `log2`, `exp`, `exp10` and
`exp2` are now exposed as transforms independent of user context.
* Pickleability is now guaranteed and tested via unit tests. Failure to pickle
any formulaic metadata object (such as formulas, model specs, etc) is
considered a bug.
* The capturing of user context for use in formula materialization has been
split out into a utility method `formulaic.utils.context.capture_context()`.
This can be used by libraries that wrap Formulaic to capture the variables
and/or transforms available in a users' environment where appropriate.

Page 3 of 4

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.