API changes:
- returning underlying MPI lib return code from `send`, `recv` and `allreduce` instead of asserting it is zero within these functions (opens up possibilities to use these function in a multi-threaded environmet, and makes the `numba-mpi` a more lightweight wrapper)
- `allreduce` no longer allocates memory for the result, but rather expects it to be passed as argument (type and size not checked!)
thanks Delcior!