L2metrics

Latest version: v3.1.0

Safety actively analyzes 641171 Python packages for vulnerabilities to keep your Python projects secure.

Scan your dependencies

Page 3 of 5

2.7.0

- Minor performance optimizations
- Implemented per-task outlier clamping
- Implemented mean method for aggregating lifetime metrics
- Implemented input settings file and save out of settings used to produce metrics
- Handled scenario type in evaluation
- Added validation script to look for required evaluation files
- Simplified command-line arguments
- Output data frames using feather instead of pickle
- Allowed evaluation block data in STE logs
- Stored intermediately processed data in report log data
- Standardized task names when parsing data range file for normalization
- Implemented feature to save settings as JSON file
- Increased verbosity of error message in data range validation
- Handled storing multiple STE runs with option to save in write or append mode
- Implemented different options for averaging STE runs (time or metrics)
- Modified command-line arguments for normalization and smoothing methods
- Implemented option to show lines between evaluation blocks
- Implemented command-line argument for smoothing window length
- Updated example logs
- Implemented recursive flag in l2metrics package

2.6.0

- Throw warnings instead of raising exception when checking for alternating blocks
- Print valid application measures in error message when input is invalid
- Implement feature to add Gaussian noise to log data
- Added relative Single-Task Expert plotting
- Modified evaluation script to loop over every agent configuration in the given evaluation directory
- Implemented function for recursively unzipping logs in evaluation directory
- Added fields to output TSV and also report task-level metrics in JSON format
- Modified terminal performance calculation to average over all evaluation block data
- Added functionality for normalization and outlier removal
- Created input parameter for passing in task performance ranges for normalization
- Separated LL metrics into individual modules
- Implemented alternative methods for performance maintenance (MRTLP and MILER)
- Recombined forward and backward transfer into a single transfer metric module
- Created parallel evaluation script with multiprocessing
- Added functionality to plot raw data behind smoothed performance curve
- Updated README and example logs/files

2.5.1

- Handled scenarios with tasks that are not all trained

2.5.0

- Handled LL logs in evaluation directories not contained within top-level scenario log directories
- Fixed import error in example metric calculation script
- Separated regime metric calculations to class method and store as member variable
- Added Python notebook for calculating single lifetime metrics with additional summaries
- Sort task names by order in which they are trained to make interpreting transfer matrix easier
- Handle NaNs in calculating regime metrics
- Filter log data by completed experiences before filling in regime number to handle misnumbering
- Handle NaNs and empty lists in saturation and terminal performance calculations
- Update evaluation README to describe directory structure assumptions

2.4.0

- Added explanation for transfer matrix output in README
- Removed task name simplification for greater compatibility with learnkit and custom task names
- Added computational cost parser in evaluation scripts
- Fixed minor bug when no STE data is stored

2.3.1

- Updated reference to metrics specification document 0.66
- Fixed manual path generation bug between Windows and Linux when getting STE task names
- Removed ambiguous imports in l2metrics init
- Removed performance measure filter when loading log data to pass validation
- Fixed dropping of metric values when NaNs were present in the same column
- Converted all task names to lowercase for more robust name comparison
- Fixed handling of NaN task parameters

Page 3 of 5

© 2024 Safety CLI Cybersecurity Inc. All Rights Reserved.