✨ New features and improvements
* Add [Mish](https://github.com/digantamisra98/Mish) activation. Use via the `thinc.v2v.Mish` layer, which computes `f(X) = mish(W X + b)`. CUDA and Cython kernels are included to make the activation efficient.
* Add experimental support for [RAdam](https://github.com/LiyuanLucasLiu/RAdam) to the optimizer. Enable it with the keyword argument `use_radam` to `True`. In preliminary testing, it's a small change that's worth enabling.
* Add experimental support for [Lookahead](https://github.com/alphadl/lookahead.pytorch) to the optimizer. Enable it by setting the keyword argument `lookahead_k` to a positive integer. In preliminary testing, it helps if you're not using parameter averaging, but with averaging it's a bit worse.
* Add experimental support for LARS to the optimizer. Enable it by setting `use_lars` to `True`. In preliminary testing, this hasn't worked well at all – possibly our implementation is broken.
🙏 Acknowledgements
Big thanks to digantamisra98 for the [Mish](https://github.com/digantamisra98/Mish) activation, especially the extensive experiments and simple gradient calculation. We expect to be using the activation in the next round of spaCy models.
Gratitude to the [fast.ai](https://fast.ai) community for their crowd-sourced experiments, and especially to users LessW2020, MGrankin and others for their optimizer implementations, which we referenced heavily when implementing the optimizers for Thinc. More importantly, it's super helpful to have a community filtering the deluge of papers for techniques that work on a few different datasets. [This thread](https://forums.fast.ai/t/meet-radam-imo-the-new-state-of-the-art-ai-optimizer/52656) on optimization research was particularly helpful.