Archive for category General

The case of the buggy executor

I spent a mystifying half hour chasing down a bug recently, so I thought I would share.

Here is a simple scheduled executor:

public class Exec {
    final static Logger logger = LoggerFactory.getLogger(Exec.class);
    private static final ScheduledExecutorService executor =
        Executors.newSingleThreadScheduledExecutor(
            new ThreadFactory() {
                @Override
                public Thread newThread(Runnable r) {
                    Thread result = new Thread();
                    result.setName("BuggyExecutor");
                    return result;
                }
            });
    public static void main(String[] args) {
        executor.scheduleAtFixedRate(() -> logger.info("Tick"),
                0,
                1, TimeUnit.SECONDS);
    }
}

I always use a ThreadFactory in my executors since seeing thread names plainly in your trace simplifies debugging threading issues considerably. Whenever I see a pool-2-thread-3 in my thread dump, I track down the lazy library that caused that monstrosity and I seriously consider replacing it with one that is written by developers more respectful of my time.

Other than that, this code is pretty straightforward and if you run it, you would expect the string “Tick” to be displayed every second:

17:37:18.780 [BuggyExecutor] INFO  com.beust.Exec - Tick
17:37:19.780 [BuggyExecutor] INFO  com.beust.Exec - Tick
17:37:20.780 [BuggyExecutor] INFO  com.beust.Exec - Tick

However, if you run the code as provided above, you will see that it does absolutely nothing.

Your code doesn’t speak for itself

I recently reviewed a commit that said “Fix the list view bug”. I reviewed it, saw that it was fixing an off-by-one error, approved it and moved on.

A few days later, another commit went by that said “Really fix the list view bug”. This fix was a bit more involved and caused the first item in the list view to sometimes receive the wrong styling. I then realized that I shouldn’t have approved the first commit without asking a few more questions.

Here is another scenario. Let’s say you are asked to review the following code:

public static int compare(@Nonnull Long a, @Nonnull Long b) {
	return a.compareTo(b);
}

Seems pretty harmless, doesn’t it? No reason not to approve it.

How about this one:

/**
 * @return 0 if the two numbers are equal, 1 otherwise.
 */
public int cmp(@Nonnull Long a, @Nonnull Long b) {
	return a.compareTo(b);
}

Now we have a problem: the code and the comment do not agree. This should not be approved before asking the developer to fix this (either change the comment or change the code).

There is this prevalent notion in the software world that good code doesn’t need comments, that it stands on its own. Or that comments are a code smell.

Irritatingly, this myth just won’t die despite repeated evidence that comments are sometimes vital to code correctness. Proponents of this myth point out that it’s easy for comments to get out of sync with the code (see the example above) and decide that because this approach is not perfect, it should be avoided altogether.

This is a false dichotomy that is easily avoided by making it clear to your teammates that both code and comments need to be reviewed.

The problem with the first example that I gave is that the developer failed to disclose what the intent of his code was. The code type checks, is correct and fixes a bug, but it turns out to be doing something different than the developer intended, and the reviewer would have caught it if the developer had explained what the intent of his code is.

Not all code needs comments, but certain pieces of code are useless and can’t be verified without comments.

Your code says “What?”, your comments say “Why?”. Sometimes, you need both in order to assess the correctness of a commit. Just make sure you review comments as seriously as you review code.

Android, Rx and Kotlin: part 2

I haven’t been quite honest with you my previous post: the code I showed in the article doesn’t exactly result in the short video of the application at the top of the article.

If you run the code as is, you will notice something very irritating (and unacceptable in any application): whenever the app is pretending to make a network call, the entire user interface freezes for a second. You can’t type anything and the loading icon stops spinning. This is the classic symptom of blocking the main thread. You will remember that I am simulating network calls by simply sleeping for a little while, and obviously, if you do this on the main thread, you will freeze your UI.

By default, Rx runs everything on your current thread, which is the main thread in Android: the thread that is in charge of updating your user interface. Android is exactly like most graphical toolkits: you should only use the main thread to update your UI but anything else you do (network or file system access, computations, database updates, etc…) needs to be done on a background thread. Rx has a very good solution to this problem.

Threading

Until recently, AsyncTask was the recommended way of performing this kind of task: by creating and executing an AsyncTask, you can run your code in two locations, one that will be run in a background thread (doInBackground()) and once that task completes, code that will run on the main thread (onPostExecute()).

AsyncTask has a troubled past and it has evolved quite a bit over the many revisions of the Android API: first it was single threaded, then it became multithreaded and more recently, it’s running in the background on one thread in an attempt to provide both parallelism and sequencing at the same time. If you need more information about AsyncTask, this article explains how it evolved.

This is not the only issue with AsyncTask: it’s also fairly challenging to get its behavior right while going through configuration changes or the possibility of your activity being paused or destroyed while the task is still running.

Rx offers a few solutions to some of these problems, but not all.

Threading and Rx

Rx offers two methods to control your threading model: subscribeOn() and observeOn().

In a nutshell, observeOn() defines what thread your observer will run on (this is where you usually do the work) and subscribeOn() defines the thread where your operators will run (map(), filter(), etc…).

The parameter you give to these methods is a Scheduler, an Rx abstraction that encapsulates a thread. Rx defines a few standard ones:

  • Schedulers.computation(): When you are calculating something.
  • Schedulers.io(): When you are doing I/O (network, file system, database access, …).
  • And a few others I won’t get into here.

Additionally, RxAndroid defines the more Android-specific AndroidSchedulers.mainThread(), which is self explanatory.

A typical piece of code on Android is to run a few tasks in the background (network access, expensive computation, database update, etc…) and based on the result of that action, you update your UI. The way to implement this with Rx is straightforward: you subscribe on whichever background thread is more appropriate for your actions and you observe on the main thread:

trait Server {
    fun findUser(name: String) : Observable<JsonObject>
}
data class User(val id: String, val name: String)
fun p(s: String) {
    println("[${Thread.currentThread().getName()}] ${s}")
}
Observable.just("cedric")
    .subscribeOn(Schedulers.io())
    .flatMap {
    	p("Calling server.findUser");
    	server.findUser("cedric")
    }
    .map{jo ->
        p("Mapping to a User object")
    	User(jo.get("id").getAsString(),
             jo.get("name").getAsString())
     }
    .observeOn(AndroidSchedulers.mainThread())
    .subscribe{ u -> p("User: ${u}a") }

We start with a string (which could come from an EditText and we specify that we’ll be subscribing from the I/O thread. Then we call the server with that name (on the I/O thread), turn the JSON response into a User object and we print that object:

[IoThreadScheduler-1] Calling server.findUser
[IoThreadScheduler-1] Mapping to a User object
[Main] User: User(id=123, name=cedric)

Note that even though you can specify multiple subscribeOn, all the subscriptions will happen on the first scheduler (subsequent subscribeOn will be ignored). I’m not sure if this is by design or just an oversight, but it’s not really a problem in practice. If you ever want to subscribe on multiple schedulers, you can always make this happen in the body of your subscription itself (for example, in the example above, if the server call was actually using Retrofit, you would see it’s using its own thread pool to make that call).

And that’s about all there is to get started with thread management with Rx on Android. As you can see, structuring your code this way makes the intent and thread handling extremely clear and easy to trace through, much more so than with AsyncTask.

With the growing number of Android libraries adding support for Rx, it’s becoming even more trivial to use these libraries within this framework and combine them in straightforward yet powerful ways. You can see in the examples I used in this post and the previous one how Rx makes it trivial to combine network calls and GUI updates simply by the fact that Retrofit returns Observables. You should also take a look at SQLBrite, which wraps SQLiteOpenHelper in Observables to offer you similar flexibility but for database access.

Read part 1, part 3.

“New Comments”: A Chrome extension for reddit and Hacker News



I recently became tired of how awkward following discussions on Hacker News and reddit is so I decided to write a Chrome extension to address the problem. The result is “New Comments”.

This extension will trigger only on Hacker News and reddit. When you read a comment thread on either of these sites, the extension will compare the current time with the last time you refreshed it, locate all the new comments and highlight them. If you enabled Chrome sync, the extension will work across all your computers so that you can start reading a discussion on your laptop and keep reading it later on your desktop with only the correct new comments highlighted.

Some of the lessons I learned in the process:

  • Writing a Chrome extension is ridiculously easy and the debugger is top notch.
  • Chrome’s storage API makes it trivial to build on top of Chrome sync.
  • Javascript is productive but still frustrating because of its lack of types. Even when the code was just a handful of functions, I was already beginning to hit bugs coming from the absence of the type system. I used the debugger as a substitute for types way too often.
  • Following this, I realized that my inclination to refactor and polish the code was considerably lessened. The code is pretty ugly as it is, with string literals splattered a bit everywhere and a questionable module structure, but once it was working, I didn’t feel the urge to make it perfect that I usually feel with a statically typed language and a powerful IDE under my fingers.

If you read Hacker News or Reddit and you’d like to optimize your time spent on these sites, try New Comments and let me know what you think!

A Rubik’s cube implementation in Three.js



In 2003, I took the time to put together a page that explains how to solve the Rubik’s cube with a set of formulas that are easy to memorize. The idea was not necessarily to solve the cube quickly (it takes about 50-60 seconds to solve the cube with this approach) but to make it easy for anyone to beat the cube with little effort, as opposed to the hundreds of formulas that speed cubists have to memorize to remain competitive.

Back then, I used a Java applet that represents a cube with formulas and then playing these formulas to explain to the reader how exactly they work. As everybody knows, Java applets have fallen out of favor even more today than they already were ten years ago, so I’ve been wanting to update my page with more modern technologies for a while, especially if these technologies don’t show a scary warning to everyone who reaches my web site.

I finally took the time to update my page and I reimplemented the entire cube animation in Javascript with Three.js. Here is how it looks like today.

While it’s easy to find rapidly implemented Rubik’s cubes in Three.js, I couldn’t find anything that came remotely close to what I needed, namely, being able to configure a cube directly from the HTML page along with the formula, and playing this formula at the click of a button. Three.js turned out to be a great match for this, with the perfect amount of abstraction and power. And with Three.js came a few free bonus tools, such as being able to move the cube around and also effortlessly zooming in and out. Of course, it’s equally trivial to modify the size of the cube, the position of the camera, the field of view, and just as easy to add more fancy stuff such as lighting, shadows and fancy materials.

The implementation is open source but I want to write some proper documentation before publishing it.

Overall, the experience was fairly pleasant. My relationship with Javascript is stormy at times (I’m planning to rewrite this in Dart at some point to compare) but in the end, using both IDEA and Eclipse to write the code (switching between both to compare) and with the Chrome debugger in a separate window, the productivity level is pretty high.

Implementing the cube itself was the most interesting part: there are so many ways you can model a Rubik’s cube that such a problem is a software designer’s dream. I must have had three different iterations of the data model before I settled on the version you see now (and I’m already thinking of ways I could improve it).

Three.js has come a long way since I gave it a try two years ago. The web is still filled with incorrect information referencing an old API that has since then changed, but it’s very intuitive and, most importantly, it allowed me to completely avoid having to deal with the WebGL madness. I don’t know if it’s my brain that’s just not wired for this, but I have tried to read countless OpenGL tutorials over the past years and every time, I give up after an hour, my eyes glazing over the intricate, effect-littered, abstraction empty details of OpenGL. The API is probably too low level for someone like me with just a passing interest for gory graphical details.

My knowledge of computer graphics and animation is abysmal overall, so this was a great opportunity to move myself out of my comfort zone and force myself to confront problems that I usually never encounter, such as finding tricks to counter floating point rounding errors and revising matrix multiplications and other miscellaneous linear algebra and 3D geometry concepts such as quaternions and gimbal locks.

Swift, Apple’s new language

Apple just announced a new language called Swift. I took a look at the language manual, here is a quick overview:

  • Statically typed with type inference.
  • Generics.
  • Closures.
  • No exceptions.
  • Extension methods.
  • Properties (syntax similar to C#), including lazy properties with the "@lazy" annotation.
  • Functions, methods and type (static) methods.
  • Support for observers (with "willSet" and "didSet"). Interesting to see the observer pattern baked in a language although I’m more partial to event buses for this kind of thing.
  • Enums.
  • Classes and structures (structures have restrictions regarding inheritance and other things).
  • For and while loops (statements, not expressions).
  • "mutating" keyword.
  • Named parameters.
  • Deinitializers (finalizers).
  • Protocols (interfaces).
  • Optional chaining with "a?.b?.c" and forced dereference with "!."“.
  • Convenient “assign and test”: "if let person = findPerson() ...".
  • Type casting with "is", down casting with "as?" (combines nicely with the "let" syntax. Ceylon does it right too).

Very interesting overall, and a clear step up from Objective C. From the feature set, I would say the language that Swift has the most overlap with is Kotlin, which is great news for Apple developers.

Update: discussion on reddit.

Coding challenge: partial results semantics

Consider the simple following signature:

  List<Person> searchPeople(String query);

This method takes a search string (e.g. “Bill”) and returns a list of Person that match this search string. This includes people with “Bill” as their first name, or as their last name, or maybe even using nicknames (someone whose name is “William” would match).

However, you have millions of people in your database, which means that this function call can potentially return tens of thousands of people, and it can also be quite time consuming. But your caller cannot wait forever and they want to cap the amount of time you spend doing the search, e.g. five seconds.

The function itself doesn’t know about this limit, it just does as much as it can and then it gets interrupted by its caller after the time has run out. You can imagine that the caller invokes this function in a separate thread and then calls get on the Future with a time out of five seconds.

The nice thing about the signature above is that it’s referentially transparent, which offers a lot of nice properties. However, it’s also binary: either it returns everything that matches the search or it gets interrupted before it can finish and the caller gets zero results.

The challenge is to write this function so that when it gets interrupted by the time out, it still returns whatever it has found so far.

The solution is trivial using mutable structures so bonus points if you can implement this solution with immutable data. Any language welcome, and I suggest you use pastebin or a similar service to share your code, since the comment system is not very good at formatting code.

More about language popularity

Hot on the tail of my previous post about language popularity, the latest numbers from the TIOBE are out.

The top five languages are C, Java, Objective-C, C++ and Visual Basic. Every other language beyond that has less than 4% mind share. The next JVM languages are Scala (#35) and Groovy (#48). Clojure didn’t make it in the top 50.

The pitfalls of Test-Driven Development

A few days ago, David Heinemeier Hansson posted a very negative article on Test-Driven Development (TDD) which generated quite a bit of noise. This prompted Kent Beck to respond with a Facebook post which I found fairly weak because it failed to address most of the points that David made in his blog post.

I have never been convinced by TDD myself and I have expressed my opinions on the subject repeatedly in the past (here and here for example) so I can’t say I’m unhappy to see this false idol finally being questioned seriously.

I actually started voicing my opinion on the subject in my book in 2007, so I thought I’d reproduce the text from this book here for context (with a few changes).

The Pitfalls of Test-Driven Development

I basically have two objections to Test-Driven Development (TDD).

  1. It promotes microdesign over macrodesign.
  2. It’s hard to apply in practice.

Let’s go over these points one by one.

TDD Promotes Microdesign over Macrodesign

Imagine that you ask a famous builder and architect to construct a sky scraper. After a month, that person comes back to you and says

“The first floor is done. It looks gorgeous; all the apartments are in perfect, livable condition. The bathrooms have marble floors and beautiful mirrors, the hallways are carpeted and decorated with the best art.”

“However,” the builder adds, “I just realized that the walls I built won’t be able to support a second floor, so I need to take everything down and rebuild with stronger walls. Once I’m done, I guarantee that the first two floors will look great.”

This is what some premises of Test-Driven Development encourage, especially aggravated by the mantra “Do the simplest thing that could possibly work,” which I often hear from Extreme Programming proponents. It’s a nice thought but one that tends to lead to very myopic designs and, worst of all, to a lot of churn as you constantly revisit and refactor the choices you made initially so they can encompass the next milestone that you purposefully ignored because you were too busy applying another widespread principle known as “You aren’t going to need it” (YAGNI).

Focusing exclusively on Test-Driven Development tends to make programmers disregard the practice of large or medium scale design, just because it no is longer “the simplest thing that could possibly work”. Sometimes it does pay off to start including provisions in your code for future work and extensions, such as empty or lightweight classes, listeners, hooks, or factories, even though at the moment you are, for example, using only one implementation of a certain interface.

Another factor to take into consideration is whether the code you are writing is for a closed application (a client or a Web application) or a library (to be used by developers or included in a framework). Obviously, developers of the latter type of software have a much higher incentive to empower their users as much as possible, or their library will probably never gain any acceptance because it doesn’t give users enough extensibility. Test-Driven Development cripples library development because its principles are at odds with the very concept of designing libraries: think of things that users are going to need.

Software is a very iterative process, and throwing away entire portions of code is not only common but encouraged. When I start working on an idea from scratch, I fully expect to throw out and completely rewrite the first if not the first two versions of my code. With that in mind, why bother writing tests for this temporary code? I much prefer writing the code without any tests while my understanding of the problem evolves and matures, and only when I reach what I consider the first decent implementation of the idea is it time to write tests.

At any rate, test-driven developers and pragmatist testers are trying to achieve the same goal: write the best tests possible. Ideally, whenever you write tests, you want to make sure that these tests will remain valid no matter how the code underneath changes. Identifying such tests is difficult, though, and the ability to do so probably comes only with experience, so
consider this a warning against testing silver bullets.

Yes, Test-Driven Development can lead to more robust software, but it can also lead to needless churn and a tendency to over-refactor that can negatively impact your software, your design, and your deadlines.

TDD Is Hard to Apply

Test-Driven Development reading material that I have seen over the years tends to focus on very simple problems:

  • A scorecard for bowling
  • A simple container (Stack or List)
  • A Money class
  • A templating system

TDD works wonders on these examples, and the articles describing this practice usually do a good job of showing why and how.

What these articles don’t do, though, is help programmers dealing with very complex code bases perform Test-Driven Development. In the real world, programmers deal with code bases comprised of millions of lines of code. They also have to work with source code that not only was never
designed to be tested in the first place but also interacts with legacy systems (often not written in Java), user interfaces, graphics, or code that outputs on all kinds of hardware devices, processes running under very stringent real time, memory, network or performance constraints, faulty hardware, and so on.

Notice that none of the examples from the TDD reading materials fall in any of this category, and because I have yet to see a concrete illustration of how to use Test-Driven Development to test a back-end system interacting with a 20-year-old mainframe validating credit card transactions, I certainly share the perplexity of developers who like the idea of Test-Driven
Development but can’t find any reasonable way to apply it to their day jobs.

TestNG itself is a very good candidate for Test-Driven Development: It doesn’t have any graphics, it provides a rich programmatic API that makes it easy to probe in various ways, and its output is highly deterministic and very easy to query. On top of that, it’s an open source project that is not subject to any deadlines except for the whims of its developers.

Despite all these qualities, I estimate that less than 5% of the tests validating TestNG have been written in a TDD fashion for the simple reason than code written with TDD was not necessarily of higher quality than if it had been delivered “tests last.” It was also not clear at all that code produced with TDD ended up being better designed.

No matter what TDD advocates keep saying, code produced this way is not intrinsically better than traditionally tested code. And looking back, it actually was a little harder to produce, if only because of the friction created by dealing with code that didn’t compile and tests that didn’t pass for quite a while.

Extracting the Good from Test-Driven Development

The goal of any testing practice is to produce tests. Even though I am firmly convinced that code produced with TDD is not necessarily better than code produced the traditional way, it is still much better than code produced without any tests. And this is the number one lesson I’d like everybody to keep in mind: how you create your tests is much less important than writing tests in the first place.

Another good quality of Test-Driven Development is that it forces you to think of the exit criteria that your code has to meet before you even start coding. I certainly applaud this focus on concrete results, and I encourage any professional developer to do the same. I simply argue that there are other ways to phrase these criteria than writing tests first, and sometimes even a simple text file with a list of goals is a very decent way to get started. Just make sure that, by the time you are done with an initial version, you have written tests for every single item on your list.

Don’t test first, test smart.


Update: Discussion on reddit

Language popularity on GitHub

RedMonk just published their latest survey of Github’s most popular languages, and given Github’s continuous growth in popularity, they are worth a look.

Here are the results at a glance:

  • Javascript is seeing a consistent and serious growth.
  • Ruby is in sharp decline.
  • Python is showing a decline as well, although not as severe as Ruby.
  • Java is showing some growth, and it’s also the only JVM language in the top 12 listed by Red Monk.

I’m going to go out on a limb and predict that Python is being replaced by Go. I don’t have a lot of information to back up this prediction except that most of the positive articles I read about Go are written by Python developers, and a lot of them say that they are now actively migrating their code base from Python to Go. I don’t see as much enthusiasm for Go from developers using statically typed languages, probably because of Go’s antiquated type system (which is still a big step up from Python, obviously).

It’s interesting to see Java continue to grow twenty years after its introduction. I think this constant growth is fueled by the language’s remarkable versatility and its ability to adapt to new technologies but also driven by a series of constant popularity boosts such as Android five years ago and Java 8 this year.

I’m surprised not to see Groovy in this top 12 of languages, since it’s usually acknowledged as the second most popular language on the JVM and I expected its popularity grow thanks to Gradle, but this doesn’t seem to be enough to crack the top 12 on Red Monk.

Update: Discussion on reddit