This converts the `MPIShared` class to use POSIX shared memory underneath. We had been using MPI shared windows for this even though we are managing write access to this memory ourselves. Most MPI implementations have a limit on the number of global shared memory buffers that can be allocated and this is insufficient for many cases. This new code is limited only by the number of shared memory segments per node supported by the kernel. The unit tests were run successfully up to 8192 processes.