July 10, 2008
Coding challenge wrap-up
Click on the image to see Eric's full comic strip
I certainly didn't expect so many reactions to the coding challenge... More than 130 comments so far, wow.
First of all, I owe an apology to all commenters for my annoying comment system (which prohibits posting URL's that start with "http"). I'm very sorry about that but I receive so much spam that it's a necessity. Some people braved the odds and posted their solution anyway, others used creative ways to submit their code, such as using Google Documents or Paste Bin (which has the benefit of syntax highlighting).
Thank you all for putting up with this and participating in this fun contest anyway.
I've learned a lot from all these solutions and discussions, which featured the following languages: Java, C, Perl, Erlang, Javascript, C#, Groovy, Haskell, AS3, C, Fan, LUA, J, OCaml, Factor, Forth, Lisp, Ursala, Prolog.
Here is a quick wrap-up.
The solutions basically fall into three categories:
- Concise.
- Fast.
- None of the above.
Ruby
(98..103).select { |x| x.to_s.size == x.to_s.split(//).uniq.size }
Python
(i for i in xrange(start, end) if len(str(i)) == len(set(str(i)))):Scala
for (i <- 1 to 100000; s = i.toString; if HashSet[Char](s:_*).size == s.size) println(i)J
f =: [:(#;[:>./2-~/\])(#~([:*/[:~:":)"0)For a minute, I thought that last entry was a joke or that maybe, the poster was disconnected in the middle of posting his solution. But no, J is a real programming language.
Ursala
#import nat func = ^(nleq$^+ difference*typ,length)+ ~&triK2tkZ2FlS+ iota+successor; * ^lrhPX/~& %nPQuite an intidimidating syntax here too :-)
The problem with the concise solutions is that they eliminate duplicate digits by converting the integer into a string, which results in prohibitive running times (the Ruby code takes 27 hours to complete with max = 10^10, which is the baseline I'll be using from now on).
Let's turn our attention briefly to solutions that are neither concise nor fast...
I was disappointed to see the Erlang code, to be honest, because the (only) solution that was posted is a bit frightening. I would love to see more attempts in Erlang that are either concise or fast. This problem seems to be a good candidate for sharding, since all the numbers that satisfy the requirements can be found in complete isolation of the others, so this a good opportunity for Erlang to show what it's good at.
Also, somebody posted a Prolog solution which is shorter than Erlang's, so I don't think the declarative aspect of Erlang is the reason for the length of this solution.
Can somebody post an Erlang solution that is either concise or fast?
Similarly, Perl and Javascript didn't particularly shine in the contributions that I saw in the comments. People also posted solutions in Forth (a bit hard to read, but my Forth is rusty) and even Factor (which is reasonably concise but also seems to use a lot of libraries).
Crazy Bob was the first one to post a solution that is reasonably concise and also scaringly fast. Not surprisingly, it's not brute force, it only uses primitive integers, it uses a bitmask to keep track of which digits have already been used and to top it all, it's recursive (it's not very often you see recursive code that is faster than everything else, although admittedly, the recursion doesn't go very deep).
Bob's first attempt was able to calculate all numbers from 1 to 10^10 in half a second, which blew away everything that had been posted so far. Interestingly, the C version of his Java code ran at about the same speed as Java.
Quite a few people observed that my problem was similar to generating permutations, with the little twist that '0' is not allowed to appear in first position. With that in mind, I thought I would see people grab the standard implementations for permutations that you can find on the web, adapt them to the constraints of my problem and then post them here. Interestingly, the opposite happened: Bob's solution is not only the fastest, but it can probably form the basis for a canonical solution to solve permutations quickly, especially since it's only limited by the number of characters that you can represent in a bitmask (64 if you want the bitmask to remain a primitive, unbounded if you represent it with a more complex structure).
Then, Mauricio entered the fray with OCaml and his initial attempt took 640ms to run up to 10^10. That was quite a shocker to me. I studied quite a bit of OCaml during my PhD (which I did at INRIA, so this should come as no surprise), but Caml and OCaml quickly fell off my radar when I moved on. I was quite surprised to see a real functional language be as fast as the top contenders in a purely algorithmic contest. Mauricio's version is about as concise as Java, which makes this even more impressive.
Some time later, Mauricio wrote a more functional version of his solution, which ended up running in about the same time as his imperative approach.
And then, Bob came up with a solution that runs twice faster.
It's not very often that I get excited by a piece of Java code, because I think that Java is boring (in a good way). Obviously, Bob's trick is not specific to Java, but I did go "wow" the first time I read it.
Bob's initial algorithm is not very complicated but it does have a few code paths which, at first sight, would be good candidates for possible optimizations. What I found the most interesting in his approach is that he found the optimization in the most unexpected place: by getting rid of incrementations and by adding a class.
Bob introduced a class Digit that is a doubly linked list of digits. Initially, 0 points to 1 which points to both 0 and 2, which points to... You get the idea.
Calling next() on such an object gives you the next digit in the sequence without requiring an increment operation. The trick here is to rearrange the list as soon as you use a digit so that invoking next() will always return a digit that can be added to your number without violating the requirements. For example, as soon as you add 1 to your solution, the Digit list now becomes 0 <-> 2 <-> 3... There are a couple of additional tricks with regard to head tracking and backtracking, but that's pretty much the core of the optimization.
Beautiful code indeed.
Concluding remarks
My first take away from this little exercise is that conciseness only goes so far. I have seen people post a one-liner solution on their blog in language X and conclude "X rocks". Except that their solution will take hours to complete.
A credible language must make it possible for developers to solve problems with either conciseness or performance in mind, and ideally, allow a whole spectrum between these two extremes.
The comments also discussed the definition of "brute force", and after some initial turmoil, everybody seems to agree that Bob's solution is not brute force since it only ever considers valid digits. At this point, I challenge anyone who disagrees to come up with a "really non brute force" solution which should, in theory, be much faster than Bob's solution. Good luck with that :-)
Finally, I'd like to get a better sense of performances for all the languages that have participated in this little contest, so I offer the following follow-up problem: port Bob's solution to your language of choice, post the code along with 1) how fast Bob's Java solution runs on your system and 2) how fast your solution runs.
So far, only Java, C/C++ and OCaml have shown to be up to the task. Can you add your own favorite language to the list?
July 07, 2008
Jazoon presentation available online
In this presentation, I gave a fast and furious tour of the programming languages that you have seen on front pages recently, some of which are seen as potential successors to Java. I also make the point that despite these claims, I don't see any language dethrone Java for a very long time. My prediction is that Java will keep cherry picking the best features from all these languages and continue to adapt to meet the challenges of tomorrow, either at the language level (unlikely to change much) or via frameworks (much more likely).
June 27, 2008
Coding challenge
Update: I posted a wrap-up here.
Here is an interesting coding challenge: write a counter function that counts from 1 to max but only returns numbers whose digits don't repeat.
For example, part of the output would be:
- 8, 9, 10, 12 (11 is not valid)
- 98, 102, 103 (99, 100 and 101 are not valid)
- 5432, 5436, 5437 (5433, 5434 and 5435 are not valid)
- Display the biggest jump (in the sequences above, it's 4: 98 -> 102)
- Display the total count of numbers
- Give these two values for max=10000
Post your solution in the comments or on your blog. All languages and all approaches (brute force or not) are welcome.
I'm curious to see if somebody can come up with a clever solution (I don't have one)...