I received a few interesting comments on my previous entry regarding the new multithreaded topological sort I implemented in TestNG and there is one in particular from Rafael Naufal that I wanted to address:

The @Priority annotation couldn’t be adapted to say which free methods get scheduled first? BTW, the responsibility of knowing which nodes are free couldn’t be moved to the graph of test methods?

This is pushing my current algorithm even further in the sense that we not only want to schedule free nodes as they become available, we also want to schedule them in order of importance.
What makes a node more important than another? Its level of dependencies. The more a node is depended upon, the more beneficial it is to schedule it as soon as possible since this will end up freeing more nodes. Admittedly, you are still bounded by the size of your thread pool, but this is exactly what we want: increasing the pool size should lead to more parallelism and therefore better performance, but the current scheduling algorithm being fair (or random) means that we are not guaranteed to see this performance increase.
My first reaction was to modify my Executor but as it turns out, you can actually do this with the existing implementation. The constructor of all the Executors takes a
BlockingQueue in parameter, which is the queue that the Executor will use to process the workers. Unsurprisingly, there already is a priority queue called PriorityBlockingQueue.
All you need to do is to use that queue instead of the default one when you create your Executor and then make sure that the workers you pass it have a natural ordering. In this case, the weight of a worker is how many other workers depend on it, which is very easy to calculate.
On a related topic, I wanted to get a closer look at how the algorithm I described in my previous blog post actually works. I described the theory and I have tests that show that it seems to work as expected, but it occurred to me that I could actually “view it from the inside” with little effort.
First, I added a method called toDot which generates a Graphviz file representing the current graph. This turned out to be trivial:

/**
* @return a .dot file (GraphViz) version of this graph.
*/
public String toDot() {
  String FREE = "[style=filled color=yellow]";
  String RUNNING = "[style=filled color=green]";
  String FINISHED = "[style=filled color=grey]";
  StringBuilder result = new StringBuilder("digraph g {\n");
  Set<T> freeNodes = getFreeNodes();
  String color;
  for (T n : m_nodesReady) {
    color = freeNodes.contains(n) ? FREE : "";
    result.append("  " + getName(n) + color + "\n");
  }
  for (T n : m_nodesRunning) {
    color = freeNodes.contains(n) ? FREE : RUNNING;
    result.append("  " + getName(n) + color + "\n");
  }
  for (T n : m_nodesFinished) {
    result.append("  " + getName(n) + FINISHED+ "\n");
  }
  result.append("\n");
  for (T k : m_dependingOn.getKeys()) {
    List<T> nodes = m_dependingOn.get(k);
    for (T n : nodes) {
      String dotted = m_nodesFinished.contains(k) ? "style=dotted" : "";
      result.append("  " + getName(k) + " -> " + getName(n) + " [dir=back " + dotted + "]\n");
    }
  }
  result.append("}\n");
  return result.toString();
}

Then I modified the executor to dump the graph every time a worker terminates, and finally, I wrote a shell script to convert these dot files into images and to create an HTML file. I ran a simple test case, processed the files with the shell script and here is the final result.
A yellow node is “free”, green means that the node is “ready” (to be run in the thread pool), grey is “finished” and white nodes haven’t been processed yet. Dotted arrows represent dependencies that have been satisfied.

As you can see, the execution matches very closely what you would expect based on my description of the algorithm and I confirmed that changing the size of the thread pool creates different executions.

Tags: