* Implement DecentralizedAverager for averaging model parameters & statistics across DHT peers (119 123 134 140 141)
* Accelerate RemoteMixtureOfExperts beam search with new key structure (97 101 103 109)
* Implement lossy compression algorithms for tensors (102 106 112)
* Detect anomalies in RemoteMixtureOfExperts (132)
* Configure gRPC channels for long-term stability (129 131)
* Load expert checkpoints on server startup (138)
* Support attention mask in example TransformerEncoder layer (126)
* Add the contribution guide (156)
Bugfixes:
* Fix wrong getattr in hivemind.Server (122)
Enhancements:
* Suport python3.9 and torch1.7 (142)
* Blacklist nonresponsive peers with exponential backoff (114)
* Reuse grpc channels between calls (120)
* Verify DHT peer accessibility and local clock (137)
* Improve logging, remove duplicate log entries (135)
* Improve test coverage (116)