- Previously, the ID split was done uniformly at random, meaning that images from the same sequence (i.e., taken within a few seconds of each other by the same camera) could be found across all of the training / validation (ID) / test (ID) sets.
- In v2.0, we have redone the ID split so that all images taken on the same day by the same camera are in only one of the training, validation (ID), or test (ID) sets. In other words, these sets still comprise images from the same cameras, but taken on different days.
- In line with the new iWildCam 2021 challenge on Kaggle, we have also removed the following images:
- images that include humans or pictures taken indoors.
- images with non-animal categories such as `start` and `unidentifiable`.
- images in categories such as `unknown`, `unknown raptor` and `unknown rat`.
- We added back in location 537 that was previously removed as we mistakenly believed those images were corrupted.
- We have re-split the data into training, validation (ID), test (ID), validation (OOD), and test (OOD) sets. This is a different random split from v1.0.
- Since we remove any classes that do not end up in the train split, removing those images and redoing the split gave us a different set of species. There are now 182 classes instead of 186. Specifically, the following classes have been removed: `['unknown', 'macaca fascicularis', 'proechimys sp', 'unidentifiable', 'turtur calcospilos', 'streptopilia senegalensis', 'equus africanus', 'macaca nemestrina', 'start', 'paleosuchus sp', 'unknown raptor', 'unknown rat', 'misfire', 'mustela lutreolina', 'canis latrans', 'myoprocta pratti', 'xerus rutilus', 'end', 'psophia crepitans', 'ictonyx striatus']`. The following classes have been added: `[‘praomys tullbergi', 'polyplectron chalcurum', 'ardeotis kori', 'phaetornis sp', 'mus minutoides', 'raphicerus campestris', 'tigrisoma mexicanum', 'leptailurus serval', 'malacomys longipes', 'oenomys hypoxanthus', 'turdus olivaceus', 'macaca sp', 'leiothrix argentauris', 'lophura sp', 'mazama temama', 'hippopotamus amphibius']`. For convenience, we have also added a `categories.csv` that maps from label IDs to species names.
- To speed up downloading and model training (by reducing the I/O bottleneck), we have also resized all images to have a height of 448px while keeping the original aspect ratio. All images are wide (so they now have a min dimension of 448px). Note that as JPEG compression is lossy, this procedure gives different images from resizing the full-sized image in the code after loading it.
Minor updates to existing datasets
We made two backwards-compatible changes to existing datasets. We encourage all users to update these datasets; these updates should leave results unchanged (modulo training randomness). In future versions of the WILDS package, we will deprecate the older versions of these datasets.