Template Expressions
`TemplateExpressionSpec` allows you to define a specific structure for your equations. For example:
python
expression_spec = TemplateExpressionSpec(
function_symbols=["f", "g"],
combine="((; f, g), (x1, x2, x3)) -> sin(f(x1, x2)) + g(x3)"
)
Parametric Expressions
`ParametricExpressionSpec` enables fitting expressions that can adapt to different categories of data with per-category parameters:
python
expression_spec = ParametricExpressionSpec(max_parameters=2)
model = PySRRegressor(
expression_spec=expression_spec
binary_operators=["+", "*", "-", "/"],
)
model.fit(X, y, category=category) Pass category labels
Improved Logging with TensorBoard
The new `TensorBoardLoggerSpec` enables logging of the search process, as well as hyperparameter recording, which exposes the `AbstractSRLogger` feature of the backend:
python
logger_spec = TensorBoardLoggerSpec(
log_dir="logs/run",
log_interval=10, Log every 10 iterations
)
model = PySRRegressor(logger_spec=logger_spec)
Features logged include:
- Loss curves over time at each complexity level
- Population statistics
- Pareto "volume" logging (measures performance over all complexities with a single scalar)
- The min loss over time
Algorithm Improvements
Updated Default Parameters
The default hyperparameters have been significantly revised based on testing:
- Increased default `maxsize` from 20 to 30, as I noticed that many people use the defaults, and this maxsize would allow for more accurate expressions.
- New mutation operator weights optimized for better performance, along the new mutation "rotate tree."
- Improved search parameters tuned using Pareto front volume calculations.
- Default `niterations` increased from 40 to 100, also to support better accuracy (at the expense of slightly longer default search times).
Core Changes
- New output organization: Results are now stored in `outputs/<run_id>/` rather than in the directory of execution.
- Improved performance with better parallelism handling
- Support for Python 3.10+
- Updated Julia backend to version 1.10+
- Fix for aliasing issues in crossover operations
Breaking Changes
- Minimum Python version is now 3.10
- Output file structure has changed to use directories
- Parameter name updates:
- `equation_file` → `output_directory` + `run_id`
- Added clearer naming for parallelism options, such as `parallelism="serial"` rather than the old `multithreading=False, procs=0` which was unclear
Documentation
The documentation has a new home at https://ai.damtp.cam.ac.uk/pysr/