This maintenance release delivers a new algorithm for the detection of phaselines in graphs which reduces the runtime from O(V*E) to O(V+E).
From the docs string:
Detecting phaselines is useful for determining which tasks can be performed in parallel.
Each phase in the phaselines must be completed to assure that the tasks in the next phase can be performed with complete input.
This is in contrast to Topological sort that only generates a queue of tasks, may be fine for a single processor, but has no mechanism for coordination that all inputs for a task have been completed so that multiple processors can work on them.
Here is an example of a DAG with tasks:
u1 u4 u2 u3
\ \ \_______\
csg cs3 append
\ \ \
op1 \ op3
\ \ \
op2 \ cs2
\ \___________\
cs1 join
\ \
map1 map2
\___________\
save
The phaselines would be:
phaselines = {
"u1": 0, "u4": 0, "u2": 0, "u3": 0,
"csg": 1, "cs3": 1, "append": 1,
"op1": 2, "op3": 2,
"op2": 3, "cs2": 3,
"cs1": 4, "join": 4,
"map1": 5, "map2": 5,
"save": 6,
}
From this example it is visible that processing the 4 'uN' (uploads) is the highest degree of concurrency. This can be determined as follows:
d = defaultdict(int)
for _, pl in graph.phaselines():
d[pl] += 1
max_processors = max(d, key=d.get)