Add an efficient multicore sampler ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The new sampler relies on forking instead of pickling for the ``sample_one``, ``simulate_one`` and ``accept_one`` functions. This brings a huge performance improvement for single machine multicore settings compared to ``multiprocessing.Pool.map`` like execution which repeatedly pickles.