New features
- **Datasets**
- Added a new dataset `geolifeclef2024_pre_extracted` following 2024 edition of Kaggle challenge [GeoLifeCLEF](https://www.kaggle.com/competitions/geolifeclef-2024/data/)
- Computed rolling `mean` and rolling `std` values of GeoLifeCLEF2024 dataset for each modality. These values are stored in this dataset's transform functions
- **Models**
- Added a new model "MultimodalEnsemble" in `geolifeclef2024_multimodal_ensemble` based on picekl work on [GeoLifeCLEF2024](https://www.kaggle.com/code/picekl/sentinel-landsat-bioclim-baseline-0-31626)
- **Scripts**
- Added new scripts `split_obs_spatially.py`, `sort_files_glc_fashion.sh`
- `split_obs_spatially.py`: splits a CSV observation dataset into a _training_ and a _val_ subset where _val_ observation plots are spatially separated from _training_ ones. This scripts uses new **`verde`** package.
- `sort_files_glc_fashion.sh`:
> This script re-organizes files in one folder into folders and sub-folders in the same way as for the GeoLifeCLEF challenge.
> That is to say in the following manner.
>
> Each file is re-arranged in folders and sub-folders in the following way:
> A file named 'ABCDWXYZ.pt' located at 'root_path/' will be moved to
> 'root_path/YZ/WX/ABCDWXYZ.pt'.
>
> Each file name must be at least 3 characters long. For instance:
> A file named 'XYZ.pt' located at 'root_path/' will be moved to
> 'root_path/YZ/X/XYZ.pt'.
- `split_obs_per_species_frequency`: splits a CSV observation dataset into a _training_ and a _val_ subset based on species frequency
- Added `split_obs_spatially.py` and `split_obs_per_species_frequency.py` scripts to Malpolon as modules in `malpolon.data.utils`
Changes
- Renamed `scripts` folder to `toolbox`
- Renamed scenarios from {"Ecologists", "Inference", "Kaggle"} to {"Custom_train", "Inference", "Benchmarks"} and re-organized experiments
- Fixed examples-related bugs, file links, duplicate files and cleaned config files
- Updated code documentation, repository READMEs and examples tutorial files