Highlights
Slideflow 2.1 includes a number of new features and optimizations, with a focus on improving Multiple-Instance Learning (MIL) model development and deployment. Key improvements include an **MIL / Attention Heatmaps extension** for Slideflow Studio, improvements to both **feature extraction** and **MIL training**, **new QC algorithms**, and dozens of other enhancements and bug fixes.
Table of Contents
1. **Slideflow Studio: MIL & Attention Heatmaps**
2. **MIL Training Enhancements**
a. Rebuilding feature extractors used for MIL
b. Single-slide predictions, without feature bags
c. QOL improvements
3. **Streamlined Feature Extraction**
a. Features from layer activations of an ImageNet-pretrained model
b. Features from a public pretrained network
c. Features from a SimCLR model (self-supervised learning)
d. Using feature extractors
4. **Slideflow Studio: Tile Extraction Preview & More**
5. **Slide Filtering / QC Updates**
a. DeepFocus
b. GaussianV2
6. **Smaller updates**
a. PyTorch Image Preprocessing Improvements
b. Mini-batch sample diversity for PyTorch dataloaders
c. TFRecord optimizations
d. Other new features
e. Other improvements
f. Bug fixes
Slideflow Studio: MIL & Attention Heatmaps
Slideflow Studio now includes an [MIL extension](https://slideflow.dev/studio/#multiple-instance-learning), allowing you to generate MIL predictions for slides and visualize attention as a heatmap.
Start by navigating to the Extensions tab in the bottom-left corner, and enable the "Multiple-instance Learning" extension.
![image](https://github.com/jamesdolezal/slideflow/assets/48372806/c94974ba-7ff6-4f30-85ed-490c1f3bd1ee)
A new icon will appear in the left-hand toolbar. Use this button to open the MIL widget. Models can be loaded by clicking the "Load MIL model" button, with "File -> Load MIL Model...", or by dragging-and-dropping an MIL model folder onto the window.
Information about the feature extractor and MIL model will be shown in the toolbar. MIL model architecture and hyperparameters can be viewed by clicking the "HP" button. Click "Predict Slide" to generate a whole-slide prediction. If applicable, attention will be displayed as a heatmap. The heatmap color and display can be customized in the Heatmap widget.
![image](https://github.com/jamesdolezal/slideflow/assets/48372806/d5a838df-a654-4538-bd94-1aa6a63de32d)
MIL Training Enhancements
Several changes in the MIL training process have been made to improve the user experience and facilitate deployment of trained MIL models on new slides.
Rebuilding feature extractors used for MIL
One of the previous challenges with MIL models was the reliance on generated feature "bags", even for model evaluation. Slideflow now includes tools to generate predictions from MIL models without manually generating feature bags, greatly simplifying evaluation and single-slide testing.
When image tile features are calculated and exported for a dataset (either with `Project.generate_feature_bags()` or `DatasetFeatures.to_torch()`), the feature extractor configuration is now saved as `bags_config.json` in the same directory as the exported feature bags. This configuration file contains all information necessary for rebuilding the feature extractor. An example file is shown below.
json
{
"extractor": {
"class": "slideflow.model.extractors.retccl.RetCCLFeatures",
"kwargs": {
"center_crop": true
}
},
"normalizer": {
"method": "macenko",
"fit": {
"stain_matrix_target": [
[
0.5062568187713623,
0.22186939418315887
],
[
0.7532230615615845,
0.8652154803276062
],
[
0.4069173336029053,
0.42241501808166504
]
],
"target_concentrations": [
1.7656903266906738,
1.2797492742538452
]
}
},
"num_features": 2048,
"tile_px": 299,
"tile_um": 302
}
The feature extractor can then be rebuilt with `sf.model.rebuild_extractor()`:
python
from slideflow.model.extractors import rebuild_extractor
Recreate the feature extractor
and stain normalizer, if applicable
extractor, normalizer = rebuild_extractor("/path/to/bags_config.json")
Single-slide predictions, without feature bags
The new `sf.mil.predict_slide()` function allows you to generate a whole-slide prediction (and attention heatmap) from a saved MIL model, without requiring the user to manually generate feature bags.
This is accomplished by including feature extraction information in the `mil_params.json` file stored in MIL model folders. When performing single-slide inference, Slideflow will automatically rebuild the feature extractor, calculate features for all tiles in the given slide, and pass these features to the loaded MIL model.
You can generate single-slide predictions using a path to a slide:
python
from slideflow.mil import predict_slide
slide = '/path/to/slide.svs'
model = '/path/to/mil_model'
Calculate predictions and attention heatmap
y_pred, y_att = predict_slide(model, slide)
You can also generate single-slide predictions from a loaded `WSI` object, allowing you to customize slide processing or QC before generating predictions:
python
import slideflow as sf
from slideflow.mil import predict_slide
from slideflow.slide import qc
Load slide and apply Otsu thresholding
slide = '/path/to/slide.svs'
wsi = sf.WSI(slide, ...)
wsi.qc(qc.Otsu())
Calculate predictions and attention heatmap
y_pred, y_att = predict_slide('/path/to/mil_model', wsi)
QOL improvements for MIL training
Several smaller quality of life improvements have been made for MIL training. In addition to the feature extraction configuration, the `mil_params.json` file now also includes information about the input and output shapes of the MIL network and outcome labels. An example file is shown below.
json
{
"trainer": "fastai",
"params": {
...
},
"outcomes": "histology",
"outcome_labels": {
"0": "Adenocarcinoma",
"1": "Squamous"
},
"bags": "/mnt/data/projects/example_project/bags/simclr-263510/",
"input_shape": 1024,
"output_shape": 2,
"bags_encoder": {
"extractor": {
"class": "slideflow.model.extractors.simclr.SimCLR_Features",
"kwargs": {
"center_crop": false,
"ckpt": "/mnt/data/projects/example_project/simclr/00001-EXAMPLE/ckpt-263510.ckpt"
}
},
"normalizer": null,
"num_features": 1024,
"tile_px": 299,
"tile_um": 302
}
}
When exporting feature bags for MIL training with `Project.generate_feature_bags()`, memory consumption is reduced by performing the feature bag calculation in smaller batches of slides at a time. [261]
Finally, when validating or evaluation MIL models with a categorical outcome, accuracy within each class is reported separately. [265] (thank you andrewsris)
INFO Validation metrics for outcome histology:
INFO slide-level AUC (cat 0): 0.993 AP: 0.998 (opt. threshold: 0.565)
INFO slide-level AUC (cat 1): 0.993 AP: 0.974 (opt. threshold: 0.439)
INFO Category 0 acc: 97.3% (146/150)
INFO Category 1 acc: 92.3% (36/39)
Streamlined Feature Extraction
Extracting features from image tiles - commonly used for training [Multiple-instance Learning (MIL)](http://slideflow.dev/mil/) models - has been streamlined with `sf.model.build_feature_extractor()`, providing a common API for preparing many types of feature extractors.
Features from layer activations of an ImageNet-pretrained model
Generate features from a neural network pretrained on ImageNet simply by passing the name of the network to `sf.model.build_feature_extractor()`. If a tile size is specified, input tiles will be center cropped before calculating features.
python
from slideflow.model import build_feature_extractor
resnet50_extractor = build_feature_extractor(
'resnet50',
tile_px=299
)
This will calculate features using activations from the post-convolutional layer of the network. You can also concatenate activations from multiple layers and apply pooling for layers with 2D output shapes.
python
extractor = build_feature_extractor(
'resnet50',
layers=['conv1_relu', 'conv3_block1_2_relu'],
pooling='avg',
tile_px=299
)
Features from layer activations of a fine-tuned model
Generate features from a model fine-tuned in Slideflow by calculating activations at any number of arbitrary neural network layers.
python
extractor = build_feature_extractor(
'/path/to/trained_model.zip'
)
Features from a public pretrained network
Generate features from the pre-trained CTransPath or RetCCL networks. Weights for these pretrained networks will be automatically downloaded from [HuggingFace](huggingface.co/jamesdolezal/retccl/).
python
extractor = build_feature_extractor(
'retccl',
tile_px=299
)
Features from a SimCLR model (self-supervised learning)
Generate features from a model trained with [self-supervised learning](https://slideflow.dev/ssl) using SimCLR. Specify a saved model folder or path to a model checkpoint (`*.ckpt`).
python
extractor = build_feature_extractor(
'simclr'
ckpt='/path/to/simclr.ckpt'
)
Using feature extractors
All feature extractors can then be used to calculate features from individual image tiles, [generate feature bags](https://slideflow.dev/mil/#exporting-features) for MIL training, or calculate features for an entire slide using a loaded `WSI` object.
Slideflow Studio: Tile Extraction Preview & More
Studio now facilitates quickly previewing tile extraction. Tile extraction parameters - such as slide-level processing / QC, grayspace/whitespace filtering, and stride - can be customized in the "Slide Processing" section. The "Display" section allows users to preview tile extraction by displaying outlines around tiles. When generating whole-slide predictions from a loaded model, only the shown tiles will be used.
![image](https://github.com/jamesdolezal/slideflow/assets/48372806/a4911b16-9b5a-4289-9d46-41c95f31acda)
Additional updates to Studio include:
- Gracefully handle invalid/incompatible slides with an error message, instead of crashing
- Zoom to a specific MPP in a slide with `View -> Zoom to MPP (Ctrl +/)` [270] (thank you skochanny)
- Remove status bar when capturing main view [270]
- Add MacOS M1 / MPS compatibility when generating StyleGAN images
- Fix ROI annotations on high-DPI devices
- Various stability improvements & bug fixes
Slide Filtering / QC Updates (DeepFocus, GaussianV2)
Slideflow includes two new slide filtering / QC algorithms: `DeepFocus` and `GaussianV2`.
DeepFocus
An official implementation of the DeepFocus QC algorithm is now included in Slideflow, and can be used like any other QC algorithm. By default, DeepFocus is applied to slides at 40X magnification, although this can be customized with the `tile_um` argument.
python
from slideflow.slide import qc
deepfocus = qc.DeepFocus(tile_um='20x')
slide.qc(deepfocus)
You can also retrieve raw predictions from the DeepFocus model by passing the argument `threshold=False`:
preds = deepfocus(slide, threshold=False)
GaussianV2
A new, optimized Gaussian ("blur") filter has been implemented as `sf.slide.qc.GaussianV2`. This method reduces computational time and memory consumption by first splitting the slide into smaller chunks, performing Gaussian filtering on each chunk separately (accelerated with multiprocessing), and then merging the chunks (eliminating areas of overlap to reduce stitching artifacts). `GaussianV2` will be used by default when using the QC methods `'blur'` or `'both'`.
Smaller updates
Slideflow includes a number of other new features and enhancements, as detailed below.
PyTorch Image Preprocessing Improvements
Image preprocessing and augmentations in PyTorch backend have been refactored to use torchvision transformations. This improves computational efficiency and makes custom transformation pipelines easier to work with. This results in a 3-4x speed improvement in PyTorch Gaussian blur augmentation [145], and also improves PyTorch stain normalization speed.
Custom PyTorch transformations or augmentations can be used in any PyTorch dataloader by passing a callable function to `Dataset.torch(augment=...)` or `Dataset.torch(transform=...)`. For example, to apply a resize transformation on images:
python
import slideflow as sf
from torchvision import transforms
Load a project and dataset
P = sf.load_project(...)
dataset = P.dataset(tile_px=299, tile_um=302)
Establish a resize transformation
resize = transforms.resize(512)
Create a PyTorch dataloader with this
transformation applied to images
dl = dataset.torch(transform=resize)
Custom transformations can also be used in any Tensorflow dataset using the same API. Pass a callable function to the `transform` argument of `Dataset.tensorflow()`:
python
import slideflow as sf
import tensorflow as tf
tf.function
def custom_resize(image):
return tf.image.resize(image, (512, 512))
Load a project and dataset
P = sf.load_project(...)
dataset = P.dataset(tile_px=299, tile_um=302)
Create a Tensorflow dataset with this
resize transformation applied to images
dl = dataset.tensorflow(transform=custom_resize)
Mini-batch sample diversity for PyTorch dataloaders
This update addresses a long-standing issue where mini-batches assembled with PyTorch tended to contain tiles from repeat slides. PyTorch dataloaders now enforce greater sample diversity, reducing the chance that multiple tiles from the same slide will be present in a single batch (unless the number of slides is less than the batch size). Performance auditing has revealed that this change may improve model generalizability.
TFRecord optimizations
TFrecord index files now store tile location information, greatly improving efficiency of reading TFRecords by tile location (which is performed for various internal functions, such as calculating dataset features). Existing TFRecord indices will be automatically updated with location information when used, but this process can be manually triggered with `Dataset.rebuild_index()`. Tile locations can be read from a TFRecord's index file with `sf.io.get_locations_from_tfrecord()`.
Other new features
- Add support for slide images that do not contain 'levels', such as multi-page TIFFs and Versa-scanned SVS files. (Thank you emmachancellor and skochanny)
- `Dataset.verify_slide_names()`: verify that TFRecord filenames match the slide names inside
- `sf.WSI.area()`: Calculate the area of a slide that has passed QC using
- `sf.slide.backends.vips.vips_padded_crop()`: enable extracting tiles outside the bounds of a slide, padding out-of-bounds area with white or black background.
- New `use_edge_tiles` option for `sf.WSI`. If True, will allow extracting edge tiles from the slide. Empty areas are rendered as white, in both cuCIM and VIPS backends.
- Add optional `loc`, `ncol`, and `legend_kwargs` arguments (passed to `ax.legend()`) to `Slidemap.plot()`, for customizing the UMAP plot axes. [275] (Thank you emmachancellor)
- Add support for training SimCLR with stain augmentation
Other improvements
- Improve clarity of slide backend error messages [266] (thank you cswpy)
- Include Libvips version info in `sf.about()`
- Improve PyTorch training speed by using channels-last memory format.
- Improve handling of `linalg` errors during Macenko normalization. If an error is encountered with Macenko normalization, the original image is returned instead of raising the error. This behavior can be disabled by passing `StainNormalizer.transform(allow_errors=False)`.
- Improve quality of slide thumbnail in PDF extraction report. Also adds ability to provide thumbnail keywords arguments when extracting tiles via `thumb_kwargs` (thank you skochanny)
- Improved CPU core detection in Linux. All functions which detect the number of CPU cores now use `sf.util.num_cpu()` instead of `os.cpu_count()`. This will first check available cores with `os.sched_getaffinity(0)`, which reflects available CPU cores with OS-level scheduling. If this fails (e.g. on Window and macOS systems), it will default to `os.cpu_count()`.
- SimCLR default arguments have been updated to reflect the default parameters of the original paper:
- `learning_rate`: 0.3 -> 0.075
- `learning_rate_scaling`: 'linear' -> 'sqrt'
- `weight_decay`: 1e-6 -> 1e-4
- Fix issue where Otsu's thresholding on MRXS files would occasionally fail to identify any foreground tissue. This was due to very small images in the MRXS pyramid. (thank you siddhir)
- Fix issue where MRXS slides could not be extracted when using a buffer, due to the presence of an associated folder with the MRXS file format. [300]
- Close file handles when deleting PyTorch dataloader
- Improve accuracy of mosaic map grid
- Deprecate `Project.generate_features_for_clam()`, replacing it with `Project.generate_feature_bags()`
Bug fixes
- Fix reported concordance index for survival models, which was previously being incorrectly reported as `1 - c_index`
- Fix 'input Tensor too large' error with PyTorch GPU normalizers. Fix is applied by capping the batch size for normalization at 32.
- Fix `sf.DatasetFeatures.to_csv()` [260]
- Fix mixed precision training in PyTorch
- Improve protobuf dependency versioning. Slideflow requires protobuf version <=3.20.\*. Previously, setup.py listed protobuf requirements as <=3.20.2; this has been updated to <3.21 to include any additional 3.20.\* patch releases. This also specifies tensorflow_datasets<4.9.0 to prevent protobuf version >= 4. [289] (thank you sebp)
- Pin required version of cellpose to `<2.2`
- Pin required version of pandas to `<2`
- Pin required version of timm to `<0.9` (thank you quark412)