Using single-threaded asyncio vs. ThreadPool is twice as fast or more, using just a single thread and coroutines, all standard library for Python >= 3.5. The asyncio benefit is particularly noticeable on limited resource systems like Raspberry Pi.
Also, now Matlab works really fast as well calling the Python coroutine code