New features
- The implementation of methods for constructing segmentation pixels arrays from a `highdicom.seg.Segmentation` object (`highdicom.seg.Segmentation.get_pixels_by_source_instance()`, `highdicom.seg.Segmentation.get_pixels_by_source_frame()`, and `highdicom.seg.Segmentation.get_pixels_by_dimension_index_values()`) have been considerably refactored with a general focus on improving the usability for large segmentation objects (https://github.com/ImagingDataCommons/highdicom/pull/208). These changes are compatible with existing code except that in some cases the methods may return numpy arrays with a smaller unsigned integer data type than they previously did. User code should see significant speed-ups without any changes. The new versions have several improvements:
- Improvements in computational efficiency due to a redesign of the way the frame look-up table is stored under the hood. Now an in-memory sqlite database is used through the Python standard library `sqlite3` module. This allows for considerably faster and more flexible querying.
- Significant improvements in memory efficiency for the case where `combine_segments=True`. Previously the memory usage scaled as O(n) in the number of segments, now it is constant (O(1)).
- When combining segments, the methods now automatically determine and return an appropriate unsigned integer datatype to return the smallest array that can represent all segments. This has been observed to reduce both the memory usage and improve speed (largely due to the reducing the need to allocate memory for unnecessarily large numpy arrays)
- There is a new parameter, `dtype`, that allows the user to choose the data type of the output array (overriding the automatically determined default).
- There is a further new boolean parameter `skip_overlap_checks`, which allows the user to specify that the check for overlapping segments in the case where `combine_segments=True` is skipped. This makes a significant difference to runtime. If this is done and two segments do overlap, the segment with the highest output segmentation number will be placed into the output array preferentially. The default behaviour matches the previous behaviour in that checks for overlapping segments are performed, and an error is raised if any two segments overlaps.
- The user guide is updated to the preferred way of accessing pixel data using the above methods.
- There is now an optional parameter in `from_dataset()` methods called `copy`. By default, this parameter is True, meaning that a full deepcopy of the original dataset is made before conversion to the highdicom class, which matches the previous behaviour. This is the "safest" option that prevents potentially unwanted behaviour downstream if the user tries to re-use the original dataset. However if the user chooses to set this parameter to `False`, then the deepcopy is skipped and the original dataset is updated in place. This can give a very significant speed-up when the segmentation object are large. Additionally this is used in the `segread` and `srread` functions to give a significant speed up as it is never necessary to deepcopy the temporary object read from file (https://github.com/ImagingDataCommons/highdicom/pull/207).
- Added a new function `highdicom.sr.srread()`, similar to the existing `highdicom.seg.segread()`, to read a dataset representing a supported Structured Report SOP Class from a file and convert it to the appropriate highdicom class automatically (https://github.com/ImagingDataCommons/highdicom/pull/215).
- Users may now pass a single-element Sequence to the `content` parameter of the `__init__` methods of Structured Report SOP classes, as alternative to passing a `pydicom.Dataset`. This is more intuitive for users that have constructed a `highdicom.sr.MeasuremenrtReport` class and wish to use it as the content of a new Structured Report (https://github.com/ImagingDataCommons/highdicom/pull/216).
Enhancements
- The library's repository was moved to the [ImagingDataCommons](https://github.com/ImagingDataCommons) organization on GitHub, and all URLs were updated (https://github.com/ImagingDataCommons/highdicom/pull/212).
- The library's Github Actions now run the tests using Python 3.11 in addition to older versions (https://github.com/ImagingDataCommons/highdicom/pull/217) to ensure that highdicom supports the latest Python version.
Bug fixes
- A minor tweak to the routine for segmentation construction that avoids creating a copy of large portions of the input array just to find the unique values (https://github.com/ImagingDataCommons/highdicom/pull/221).
- A bug, resulting in the `ReferencedImageSequence` of a `highdicom.ann.MicroscopyBulkSimpleAnnotations` always being empty, was resolved (https://github.com/ImagingDataCommons/highdicom/pull/220).
- A mistake in the docstrings of the `PixelToReferenceTransformer`, `ReferenceToPixelTransformer`, and `ImageToReferenceTransformer` classes was fixed (https://github.com/ImagingDataCommons/highdicom/pull/209).
- A bug that resulted in GSPS creation failing when the referenced images have multiple values for WindowWidth, WindowCenter and/or WindowCenterWidthExplanation was fixed (https://github.com/ImagingDataCommons/highdicom/pull/211).