Released on 18 December, 2023
Description
- Conflicting directives in the SmartSim packaging instructions were
fixed
- [sacct]{.title-ref} and [sstat]{.title-ref} errors are now fatal for
Slurm-based workflow executions
- Added documentation section about ML features and TorchScript
- Added TorchScript functions to Online Analysis tutorial
- Added multi-DB example to documentation
- Improved test stability on HPC systems
- Added support for producing & consuming telemetry outputs
- Split tests into groups for parallel execution in CI/CD pipeline
- Change signature of [Experiment.summary()]{.title-ref}
- Expose first_device parameter for scripts, functions, models
- Added support for MINBATCHTIMEOUT in model execution
- Remove support for RedisAI 1.2.5, use RedisAI 1.2.7 commit
- Add support for multiple databases
Detailed Notes
- Several conflicting directives between the [setup.py]{.title-ref}
and the [setup.cfg]{.title-ref} were fixed to mitigate warnings
issued when building the pip wheel.
([SmartSim-PR435](https://github.com/CrayLabs/SmartSim/pull/435))
- When the Slurm functions [sacct]{.title-ref} and [sstat]{.title-ref}
returned an error, it would be ignored and SmartSim\'s state could
become inconsistent. To prevent this, errors raised by
[sacct]{.title-ref} or [sstat]{.title-ref} now result in an
exception.
([SmartSim-PR392](https://github.com/CrayLabs/SmartSim/pull/392))
- A section named *ML Features* was added to documentation. It
contains multiple examples of how ML models and functions can be
added to and executed on the DB. TorchScript-based post-processing
was added to the *Online Analysis* tutorial
([SmartSim-PR411](https://github.com/CrayLabs/SmartSim/pull/411))
- An example of how to use multiple Orchestrators concurrently was
added to the documentation
([SmartSim-PR409](https://github.com/CrayLabs/SmartSim/pull/409))
- The test infrastructure was improved. Tests on HPC system are now
stable, and issues such as non-stopped [Orchestrators]{.title-ref}
or experiments created in the wrong paths have been fixed
([SmartSim-PR381](https://github.com/CrayLabs/SmartSim/pull/381))
- A telemetry monitor was added to check updates and produce events
for SmartDashboard
([SmartSim-PR426](https://github.com/CrayLabs/SmartSim/pull/426))
- Split tests into [group_a]{.title-ref}, [group_b]{.title-ref},
[slow_tests]{.title-ref} for parallel execution in CI/CD pipeline
([SmartSim-PR417](https://github.com/CrayLabs/SmartSim/pull/417),
[SmartSim-PR424](https://github.com/CrayLabs/SmartSim/pull/424))
- Change [format]{.title-ref} argument to [style]{.title-ref} in
[Experiment.summary()]{.title-ref}, this is an API break
([SmartSim-PR391](https://github.com/CrayLabs/SmartSim/pull/391))
- Added support for first_device parameter for scripts, functions, and
models. This causes them to be loaded to the first num_devices
beginning with first_device
([SmartSim-PR394](https://github.com/CrayLabs/SmartSim/pull/394))
- Added support for MINBATCHTIMEOUT in model execution, which caps the
delay waiting for a minimium number of model execution operations to
accumulate before executing them as a batch
([SmartSim-PR387](https://github.com/CrayLabs/SmartSim/pull/387))
- RedisAI 1.2.5 is not supported anymore. The only RedisAI version is
now 1.2.7. Since the officially released RedisAI 1.2.7 has a bug
which breaks the build process on Mac OSX, it was decided to use
commit
[634916c](https://github.com/RedisAI/RedisAI/commit/634916c722e718cc6ea3fad46e63f7d798f9adc2)
from RedisAI\'s GitHub repository, where such bug has been fixed.
This applies to all operating systems.
([SmartSim-PR383](https://github.com/CrayLabs/SmartSim/pull/383))
- Add support for creation of multiple databases with unique
identifiers.
([SmartSim-PR342](https://github.com/CrayLabs/SmartSim/pull/342))