In this new version:
1. completely parallelizable, no matter how many cells you have and how many set of annotations, it is readily scalable (no need to tune the cores, the program figure out by itself)
2. accurately garbage collection and memory management, try best to avoid the subprocess consume too much memory
3. throwing intermediate file to re-run and debug
4. in case memory overhead in a parallel version, I add scTriangulate sequential version