- Add Python 3.11.
- Add pre-processor when parsing addresses.
- Add `pin_memory=True` when using a CUDA device to increase performance as suggested
by [Torch documentation](https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html).
- Add `torch.no_grad()` context manager in `__call__()` to increase performance.
- Reduce memory swap between CPU and GPU by instantiating Tensor directly on the GPU device.
- Improve some warnings' clarity (i.e., category and message).
- Bug-fix MacOS multiprocessing. It was impossible to use in multiprocess since we were not testing whether Torch
multiprocess was set properly. Now, we set it properly and raise a warning instead of an error.
- Drop Python 3.7 support since newer Python versions are faster
and [Torch 2.0 does not support Python 3.7](https://dev-discuss.pytorch.org/t/dropping-support-for-cuda-11-6-and-python-3-7-from-pytorch-2-0-release/1021).
- Improve error handling with wrong checkpoint loading in AddressParser retrain_path use.
- Add `torch.compile` integration to improve performance (Torch 1.x still supported) with `mode="reduce-overhead"` as
suggested in the [documentation](https://pytorch.org/tutorials//intermediate/torch_compile_tutorial.html). It
increases the performance by about 1/100.