Archive for May, 2014

Coding challenge: partial results semantics

Consider the simple following signature:

  List<Person> searchPeople(String query);

This method takes a search string (e.g. “Bill”) and returns a list of Person that match this search string. This includes people with “Bill” as their first name, or as their last name, or maybe even using nicknames (someone whose name is “William” would match).

However, you have millions of people in your database, which means that this function call can potentially return tens of thousands of people, and it can also be quite time consuming. But your caller cannot wait forever and they want to cap the amount of time you spend doing the search, e.g. five seconds.

The function itself doesn’t know about this limit, it just does as much as it can and then it gets interrupted by its caller after the time has run out. You can imagine that the caller invokes this function in a separate thread and then calls get on the Future with a time out of five seconds.

The nice thing about the signature above is that it’s referentially transparent, which offers a lot of nice properties. However, it’s also binary: either it returns everything that matches the search or it gets interrupted before it can finish and the caller gets zero results.

The challenge is to write this function so that when it gets interrupted by the time out, it still returns whatever it has found so far.

The solution is trivial using mutable structures so bonus points if you can implement this solution with immutable data. Any language welcome, and I suggest you use pastebin or a similar service to share your code, since the comment system is not very good at formatting code.

More about language popularity

Hot on the tail of my previous post about language popularity, the latest numbers from the TIOBE are out.

The top five languages are C, Java, Objective-C, C++ and Visual Basic. Every other language beyond that has less than 4% mind share. The next JVM languages are Scala (#35) and Groovy (#48). Clojure didn’t make it in the top 50.

The pitfalls of Test-Driven Development

A few days ago, David Heinemeier Hansson posted a very negative article on Test-Driven Development (TDD) which generated quite a bit of noise. This prompted Kent Beck to respond with a Facebook post which I found fairly weak because it failed to address most of the points that David made in his blog post.

I have never been convinced by TDD myself and I have expressed my opinions on the subject repeatedly in the past (here and here for example) so I can’t say I’m unhappy to see this false idol finally being questioned seriously.

I actually started voicing my opinion on the subject in my book in 2007, so I thought I’d reproduce the text from this book here for context (with a few changes).

The Pitfalls of Test-Driven Development

I basically have two objections to Test-Driven Development (TDD).

  1. It promotes microdesign over macrodesign.
  2. It’s hard to apply in practice.

Let’s go over these points one by one.

TDD Promotes Microdesign over Macrodesign

Imagine that you ask a famous builder and architect to construct a sky scraper. After a month, that person comes back to you and says

“The first floor is done. It looks gorgeous; all the apartments are in perfect, livable condition. The bathrooms have marble floors and beautiful mirrors, the hallways are carpeted and decorated with the best art.”

“However,” the builder adds, “I just realized that the walls I built won’t be able to support a second floor, so I need to take everything down and rebuild with stronger walls. Once I’m done, I guarantee that the first two floors will look great.”

This is what some premises of Test-Driven Development encourage, especially aggravated by the mantra “Do the simplest thing that could possibly work,” which I often hear from Extreme Programming proponents. It’s a nice thought but one that tends to lead to very myopic designs and, worst of all, to a lot of churn as you constantly revisit and refactor the choices you made initially so they can encompass the next milestone that you purposefully ignored because you were too busy applying another widespread principle known as “You aren’t going to need it” (YAGNI).

Focusing exclusively on Test-Driven Development tends to make programmers disregard the practice of large or medium scale design, just because it no is longer “the simplest thing that could possibly work”. Sometimes it does pay off to start including provisions in your code for future work and extensions, such as empty or lightweight classes, listeners, hooks, or factories, even though at the moment you are, for example, using only one implementation of a certain interface.

Another factor to take into consideration is whether the code you are writing is for a closed application (a client or a Web application) or a library (to be used by developers or included in a framework). Obviously, developers of the latter type of software have a much higher incentive to empower their users as much as possible, or their library will probably never gain any acceptance because it doesn’t give users enough extensibility. Test-Driven Development cripples library development because its principles are at odds with the very concept of designing libraries: think of things that users are going to need.

Software is a very iterative process, and throwing away entire portions of code is not only common but encouraged. When I start working on an idea from scratch, I fully expect to throw out and completely rewrite the first if not the first two versions of my code. With that in mind, why bother writing tests for this temporary code? I much prefer writing the code without any tests while my understanding of the problem evolves and matures, and only when I reach what I consider the first decent implementation of the idea is it time to write tests.

At any rate, test-driven developers and pragmatist testers are trying to achieve the same goal: write the best tests possible. Ideally, whenever you write tests, you want to make sure that these tests will remain valid no matter how the code underneath changes. Identifying such tests is difficult, though, and the ability to do so probably comes only with experience, so
consider this a warning against testing silver bullets.

Yes, Test-Driven Development can lead to more robust software, but it can also lead to needless churn and a tendency to over-refactor that can negatively impact your software, your design, and your deadlines.

TDD Is Hard to Apply

Test-Driven Development reading material that I have seen over the years tends to focus on very simple problems:

  • A scorecard for bowling
  • A simple container (Stack or List)
  • A Money class
  • A templating system

TDD works wonders on these examples, and the articles describing this practice usually do a good job of showing why and how.

What these articles don’t do, though, is help programmers dealing with very complex code bases perform Test-Driven Development. In the real world, programmers deal with code bases comprised of millions of lines of code. They also have to work with source code that not only was never
designed to be tested in the first place but also interacts with legacy systems (often not written in Java), user interfaces, graphics, or code that outputs on all kinds of hardware devices, processes running under very stringent real time, memory, network or performance constraints, faulty hardware, and so on.

Notice that none of the examples from the TDD reading materials fall in any of this category, and because I have yet to see a concrete illustration of how to use Test-Driven Development to test a back-end system interacting with a 20-year-old mainframe validating credit card transactions, I certainly share the perplexity of developers who like the idea of Test-Driven
Development but can’t find any reasonable way to apply it to their day jobs.

TestNG itself is a very good candidate for Test-Driven Development: It doesn’t have any graphics, it provides a rich programmatic API that makes it easy to probe in various ways, and its output is highly deterministic and very easy to query. On top of that, it’s an open source project that is not subject to any deadlines except for the whims of its developers.

Despite all these qualities, I estimate that less than 5% of the tests validating TestNG have been written in a TDD fashion for the simple reason than code written with TDD was not necessarily of higher quality than if it had been delivered “tests last.” It was also not clear at all that code produced with TDD ended up being better designed.

No matter what TDD advocates keep saying, code produced this way is not intrinsically better than traditionally tested code. And looking back, it actually was a little harder to produce, if only because of the friction created by dealing with code that didn’t compile and tests that didn’t pass for quite a while.

Extracting the Good from Test-Driven Development

The goal of any testing practice is to produce tests. Even though I am firmly convinced that code produced with TDD is not necessarily better than code produced the traditional way, it is still much better than code produced without any tests. And this is the number one lesson I’d like everybody to keep in mind: how you create your tests is much less important than writing tests in the first place.

Another good quality of Test-Driven Development is that it forces you to think of the exit criteria that your code has to meet before you even start coding. I certainly applaud this focus on concrete results, and I encourage any professional developer to do the same. I simply argue that there are other ways to phrase these criteria than writing tests first, and sometimes even a simple text file with a list of goals is a very decent way to get started. Just make sure that, by the time you are done with an initial version, you have written tests for every single item on your list.

Don’t test first, test smart.


Update: Discussion on reddit

Language popularity on GitHub

RedMonk just published their latest survey of Github’s most popular languages, and given Github’s continuous growth in popularity, they are worth a look.

Here are the results at a glance:

  • Javascript is seeing a consistent and serious growth.
  • Ruby is in sharp decline.
  • Python is showing a decline as well, although not as severe as Ruby.
  • Java is showing some growth, and it’s also the only JVM language in the top 12 listed by Red Monk.

I’m going to go out on a limb and predict that Python is being replaced by Go. I don’t have a lot of information to back up this prediction except that most of the positive articles I read about Go are written by Python developers, and a lot of them say that they are now actively migrating their code base from Python to Go. I don’t see as much enthusiasm for Go from developers using statically typed languages, probably because of Go’s antiquated type system (which is still a big step up from Python, obviously).

It’s interesting to see Java continue to grow twenty years after its introduction. I think this constant growth is fueled by the language’s remarkable versatility and its ability to adapt to new technologies but also driven by a series of constant popularity boosts such as Android five years ago and Java 8 this year.

I’m surprised not to see Groovy in this top 12 of languages, since it’s usually acknowledged as the second most popular language on the JVM and I expected its popularity grow thanks to Gradle, but this doesn’t seem to be enough to crack the top 12 on Red Monk.

Update: Discussion on reddit

A bit I can’t unsee

This is the binary definition of ET’s sprite, in Atari’s infamous game:

ETWalkSprite_A0
.byte $FE ; |XXXXXXX.|
.byte $FF ; |XXXXXXXX|
.byte $C3 ; |XX....XX|
.byte $0F ; |....XXXX|
.byte $FF ; |XXXXXXXX|
.byte $3F ; |..XXXXXX|
.byte $2B ; |..X.X.XX|
.byte $E7 ; |XXX..XXX|
.byte $00 ; |........|

I am actually less disturbed by the 6502 listing than by this extra bit between ET’s legs (which you can see on the screen shot above).