I recently received an alarming bug report about the TestNG Eclipse plug-in:

I’m talking about TestNG plugin for eclipse and its performance. Since some time this plugin contains new feature – searching already finished tests. (just “Search: ” with a field).
Performing this search makes that my eclipse does not response for minutes. After all I can see as a result – all historical run tests (according to my search request).

A bug that would cause Eclipse to be unresponsive for minutes is certainly unacceptable, so I tried to dig into this problem quickly. However, I was still not quite sure what this user was doing, and since I use and I work on the plug-in on a regular basis, the problem have been happening under some very specific and unusual circumstances. A few emails later, I received the following clarification:

Yes, I’m talking about the Search box and filtering.
‘historical data” means all collected test results. In case I don’t restart eclipse for weeks and start million tests (via providers) performance plays here important role.
In fact I don’t need the search box feature. Simply today I filled it up by mistake and lost half an hour… First to find, second to remove from the box…

I have to say I laughed when I read “I filled it by mistake and lost a half hour”. Okay, maybe not very funny from this user’s stand point, but funny because I can see how having a million test instances could cause the plug-in to become unresponsive.

The problem is caused by the “Search filter”, a feature that appeared in the plug-in a few months ago:

When you type characters in this text box, the results displayed in the tree get filtered and only the nodes that match the text are shown, a functionality that is very convenient when you want to inspect the results of a specific test.

This poor user created a test run with one million test instances, accidentally typed a letter in the search box and then the plug-in diligently went through the million nodes, retaining only those that contain this letter. Obviously, deleting this character will cause the exact same thing to happen, except that the entirety of the test suite will be restored in the tree view.

This code is fairly naive and not really optimized to account for the creation of a million TreeItems, so I wasn’t really surprised to hear that doing so would cause Eclipse to become unresponsive for a while. After all, the addition of these SWT objects has to be made on the Event Dispatch Thread at some point, and whether you add them one by one or in bulk (which is what the plug-in is doing), it’s pretty much guaranteed that the dispatch thread will get severely hogged.

It’s a pretty simple problem with a few obvious solutions, the hard part is finding out the best course of action.

My first thought at trimming down the solution space was to decide that a test result featuring millions of objects was an unusual case and that optimizing this part of the code should therefore not be my priority. But just for the sake of argument, I explored several ways of doing so and I came up with various ideas, among which doing some precaching of subwords (for example, so I can match three letter words to nodes very quickly) or by virtualizing the tree. I’m sure there are plenty of other techniques available and I’m definitely interested in hearing about them.

But for now, I decided to take a lighter approach, so I made two changes:

1) When I detect that the tree contains more than a certain number of nodes, I configure the text box not to start filtering until it has at least three characters. This addresses the problem of accidentally typing a letter in the box, and it also guarantees that when the filtering is triggered, it will match fewer nodes (since these will have to match three letters instead of just one). Obviously, the code sill has to go through the million nodes.

This looks simple enough but I couldn’t help trying to devise clever ways of actually calibrating these numbers: when should this behavior be triggered? 1000 nodes? 10,000 nodes? Also, should the minimum number of characters be a function of that number? For example, require two characters for 1000 nodes but three for 10,000 nodes? How about using a log in base 1000 to create these pairs of values? Or should the base of the log be 50?

2) I added a new “Clear results” icon to the toolbar:

Clicking it wipes the results displayed in all the panes, which guarantees that typing text in the search filter will do nothing (the search text box actually gets disabled).

The only concern I have with this approach is that users might get confused to see that sometimes, the search filter will activate with one character and other times, it requires three. They might even think that the search filter is broken if nothing happens after typing two characters.

You can work around this problem by providing a tool tip to the text box or, better, display a helpful text tip in a greyed out font inside the text box itself, saying something like “[Type at least three characters to search]”.

I found it interesting how such a simple problem can offer so many different avenues to solve it and how each one comes with benefits and costs that need to be carefully weighed. I’m hoping to have struck a correct balance with the current approach, one which solves the problem at hand without impacting most of the regular users too adversely.