We're releasing three lighter-weight versions of DreamSim that each use only one ViT model (instead of the full ensemble). The backbone options are DINO-ViTB/16, CLIP-ViTB/32, and OpenCLIP-ViTB/32.
To load a single-backbone version of dreamsim, use the new `dreamsim_type` argument (defaults to "ensemble"). For example:
dreamsim_dino_model, preprocess = dreamsim(pretrained=True, dreamsim_type="dino_vitb16")
Here's how the single-backbone finetuned models compare to the ensemble on NIGHTS:
* **Ensemble**: 96.2%
* **OpenCLIP-ViTB/32**: 95.5%
* **DINO-ViTB/16**: 94.6%
* **CLIP-ViTB/32**: 93.9%
For more details please refer to our paper.