Archive for August, 2011

Deus Ex: Human Revolution

“Deux Ex: Human Revolution” is priced at $49.99 on Steam, so I thought I’d check out Amazon. A quick search for “deus ex revolution” shows the following results:

However, if you search for “deux ex standard”, the result is quite different:

$34.99!

Compared to $45-$60 for all the other versions, that’s quite a discount, and I’m a bit disappointed in Amazon for not showing this result unless you explicitly ask for it. And in case you wonder: the “standard edition” listed above is the version available for download, so you don’t get the DVD, flyer and the OnLive coupon that comes with it (unless you buy it at GameStop, which was recently caught removing these coupons because OnLive competes with their own online store. Shame).

Once purchased, you can either download it directly from Amazon or get a Steam activation code and download it from Steam.

At $34.99, it’s really impossible to pass up, especially since PC Gamer gave it a rare 94%, and pretty much everything else I’ve read about it is equally glowing. Can’t wait to try it.

Eight differences

This is a tough one, only found one so far.

Update: Discussions on Google+ (and another one).

I’ve seen things…

I just installed Netflix on a new computer, and the first movie I used to test it is… Blade Runner.

The opening, of course, but especially the legendary Rutger Hauer monologue at the end. I just don’t get tired of this scene, even if it only lasts a minute. Even more incredible is the fact that he added it to the movie at the last minute (I think I learned this from the extras in the DVD).

Sadly, the version that Netflix is offering is the one with the background voice, which I find absolutely horrific, especially for this particular scene. Roy Batty just died, Deckard is staring at him, still waiting for the final blow, you, the viewer, are holding your breath, trying to process what just happened, and suddenly, the voice comes on, completely breaking the suspension of disbelief, in a lame attempt at trying to explain Batty’s actions in case the audience didn’t get it. Really lame.

And of course, I’m not particularly thrille by the rumors of Ridley Scott committing to shoot a sequel. It’s hard for me to imagine how you could improve on one of the best movies ever created.

Watch the creator of Minecraft write a game from scratch

The creator of Minecraft, known as “Notch”, is taking part in Ludum Dare, which is a game competition in which participants have forty-eight hours to create a game from scratch. Notch is streaming his screen for the entire duration of the competition, and I find this absolutely fascinating.

Take a look at the stream (13,000 viewers right now) and you will see Eclipse, Java and… well, how to write a game. I checked an hour ago and since then, he’s added walls to what is beginning to look like a maze.

I don’t know what’s the most riveting: hearing him brainstorm with himself (probably for the benefit of his 12,000 viewers, but maybe that’s what just how he codes) or having his game run continually in the background and update itself as he makes the changes. I use Eclipse’s hot replace all the time, but seeing it apply to write a game is… mesmerizing.


Update 1: Notch stopped the stream because it was too expensive, he’s looking for alternatives


Update 2: New stream here.


Update 3: Here is the final game, a maze crawl with six bosses and permadeath.

Rolodex wars

I wanted to try Vorp not recently, a browser shooter that I heard good things about. Unfortunately, the first thing that the game asked me was to log in with Facebook. No option to play anonymously, use a different authenticator or even their own: Facebook. That’s all. Oh and of course, they asked for permission to access my contacts and all that.

I’m sure Vorp’s intentions are honorable but there is a big difference between willing to support indie games and handing them full access to my contacts without the slightest hint of what they are going to do with this information.

Obviously, there are plenty of web sites following this model, and even when they give me the choice to authenticate through Twitter or Google, I still reserve the right to draw the line about the amount of private information I’m willing to disclose. And when I’m just trying out a new service, my full name and my email address is basically the extent of what I’m comfortable to share.

So I did something I have been meaning to do for a very long time: create an empty Facebook account. With Google Voice providing me a one time SMS number, it took me five minutes to create my empty Facebook persona, which I’m now happy to pass around to any service that wants it. I’ve already used a few times and pressing the “Log in with Facebook” button has never felt so great.

Amusingly, Facebook took pity on me, and a few days after I created my empty account, I found the following in my inbox:

Thanks, Facebook, but I got that.

A gentle introduction to dependent types

Most of you are certainly familiar with this piece of code:

List<Integer>

This is the syntax for generic (parameterized) types, which has been available in Java since 2004. And of course, you know what you’re supposed to put between the brackets: a type (class or interface). But have you ever wondered what else we could use there, besides types?

For example, how about:

List<Integer><2> // not legal Java

Instead of a type, I’m now using a value of a certain type (an integer) as a type parameter.

What could this possibly mean? Well, since I’m making up the syntax (and I’ll use that liberty a bit more in the rest of this post), I might as well make up what it means as well. How about we say that the integer here represents the size of the list?

In other words:

List<Integer><2> l = List(1, 2);  // compiles
List<Integer><2> l = List(1);  // does not compile

Or we could decide that if the type that we are indexing (Integer) supports the + operator, then that value is the sum of all its elements:

List<Integer><2> l = List(2); // compiles
List<Integer><2> l = List(0, 1, 1);  // compiles
List<Integer><2> l = List(0, 1);  // does not compile

You probably guessed it by now: List<Integer><2> is a dependent type. A type parameterized not just by other types but by values of other types. The correct terminology here is that the type is “indexed” by the value (not to be confused with the meaning of “index” in the context of a list).

Different values specify different types, and of course, we don’t have to stop there: the value could be of a different type than Integer, e.g String (List<Integer><"abc">>, etc… Obviously, you’re not limited to List either, you can imagine having maps, trees or any other parameterized type.

Unfortunately, the fun stops there. While the appeal of dependent types looks strong, it’s actually very hard to cross over in the realm of concrete code.

Let’s get back to our list: how do we manipulate a dependent list with a fixed size, so that the compiler will flag any attempt to violate this constraint by refusing to type check our program?

I used the easy situation in the code snippets above by initializing my lists with constants, but obviously, this is not very useful: you pretty much never use lists that are initialized with constants at declaration time, except maybe in tests. Very often, we create a list, we populate it at runtime with some algorithm and if it’s mutable, we alter it in various ways.

First, we need to create our list without putting any elements in it, but doesn’t this violate the constraint?

List<Integer><2> l = new ArrayList<Integer><2>(); // should not compile

If we can’t even compile this, then we’ll never be able to use anything but lists initialized with constants, so let’s imagine that the compiler will let us get away with it. How do we populate our list now? As it turns out, we haven’t made much progress:

List<Integer><2> l = new ArrayList<Integer><2>(); // assume this compiles
l.add(1);    // will not compile
l.add(2);   // would compile if the previous line compiled

Having a list with one element would be a type error, so the second line in the snippet above should not compile. You could work around this limitation by having add() return a different list of a different type. It will still be a List, but indexed by the value +1 of the original one. And now you have to determine how a List<Integer><1> and a List<Integer><2> can interact and how compatible they are..

We are at an impasse, and if we want to make progress, we have to configure the compiler to increasingly release the dependent typing condition. For example, maybe it would only apply the type checker on operations that access the content of the list (read) but it wouldn’t perform any checking for operations that alter it (write). Kind of a builder pattern waiver, where the object is allowed to be in an illegal state within well delineated boundaries.

Even so, the burden of type checking will become quite big and will require the compiler to not just generate code but actually perform branch analysis to make sure that the number of elements of the list always satisfies the dependency at certain given points.

Note also that I glossed over another non-trivial aspect: how can we teach the compiler what the syntax above means? Should it be configurable or part of the language definition? What types are allowed to be indexed? What types are allowed as indices? How do types of the same family but with different indices interoperate?

By now, it should be increasingly clear that mutable types can’t realistically be dependent, which confines dependent types to immutable ones. How useful is this, really? Still working on that.

If the subject interests you, the best paper I have found so far is Dependent types at work (PDF), which is a short introduction of dependent types through Agda. Agda is probably the first language you should look at since it’s much more recent than Coq (which I was already studying as part of my PhD fifteen years ago, since I was working for my PhD at the same research institute that created Coq). This paper (and most of the literature on the subject) doesn’t touch any of the topics I described in this short introduction, they are more focused on describing how you can use Agda to formalize the proofs of your type system, which in turn will give you guidance on how your type checker should be implemented.

And if you have some interesting practical examples of dependent types, comment away!

Update: discussion on reddit.

Ah George… you prankster

Scala’s parallel collections

Scala 2.9 introduced parallel collections, which mirror most of the existing collections with a parallel version. Collections that have been parallelized this way have received a new method called par which magically parallelize certain operations on this collection.

For example, here is a sequential version:

scala> (1 to 5) foreach println
1
2
3
4
5

And the parallel version (notice the extra par keyword):

scala> (1 to 5).par foreach println
1
4
3
2
5

Obviously, the ordering will change each time you run the parallel version.

This piqued my curiosity and I decided to dig a bit further, starting with investigating what exactly is happening behind the scenes.

First of all, the parallel collections are based on an article called “A Generic Parallel Collection Framework”, by Martin Odersky, Aleksander Prokopec et al, which I highly recommend. It’s a very interesting analysis of how to decompose the concepts of parallelism, concurrency, collections, ranges and iterators and assemble them in a generic manner.

Sadly, this article ended up being the only highlight of my research in this area, because the more I dug into the Scala parallel collections, the more disappointed I became. By now, I am struggling to find a good use case for parallel collections, and I’m hoping that this article will generate some positive responses about their use.

Here are some of the problems that I found with the parallel collections, starting with the one I think is the most important one.

Lack of control

My first reaction when I saw the output above was to try to verify that indeed, threads were being spawned, and then find out how many of them, how I can control the size of the thread pool, etc…

I came up pretty much empty on all counts, and if I have missed a piece of the documentation that explains this, I would love to see it, but browsing the sources of ParSeq and other classes produced no useful result.

This is a big deal, and probably the worst problem with this framework. The loop above generated a parallel range of five entries, did it generate five threads? What happens if I try with 1000? 100000? The answer: it works for all these values, which makes me think that the loop is not allocating one thread per value. So it’s using a thread pool. But again: what size? Is that size configurable? How about other characteristics of that thread pool?

Digging deeper, what are the saturation and rejection policies? If the pool contains ten threads, what happens when it receives an eleventh value? It probably blocks, but can this be configured? Can the dispatch strategy be configured? Maybe I’m feeding operations of diverse durations and I want to make sure that the expensive operations don’t starve the faster ones, how can I do this?

This absence of configuration is a big blow to the parallel framework, and it relegates its usage to the simplest cases, where it will most likely not bring much speed gain compared to sequential execution.

Silver bullet illusion

Over the past months, I have seen quite a few users pop in the #scala channel and complain that parallel collections are not working. Taking a closer look at their code, it usually becomes quickly obvious that their algorithm is not parallelizable, and either 1) they didn’t realize it or 2) they were aware of that fact but they got the impression that par would magically take care of it.

Here is a quick example:

scala> Set(1,2,3,4,5) mkString(" ")
res149: String = 5 1 2 3 4
scala> Set(1,2,3,4,5).par mkString(" ")
res149: String = 5 1 2 3 4

You can run the par version over and over, the result will remain the same. This is confusing. Note that I used a Set this time, which indicates that I don’t care for the ordering of my collection. Calling mkString on the sequential version of my set reflects this. With this in mind, I would hope that calling mkString on the parallel version of my set would randomize its output, but that’s not what’s happening: I’m getting the same result as the sequential version, over and over.

It should be obvious that not all operations on collections can be parallelized (e.g. folds) but it looks like creating a string out of a set should be, and it’s not. I’m not going to go too far down here because the explanation is a mix of implementation details and theoretical considerations (the catamorphic nature of folds, set, the Scala inheritance hierarchy and the mkString specification), but the key point here is that the parallelization of collections can lead to non-intuitive results.

Bloat

I think the decision to retrofit the existing collections with the par operation was a mistake. Parallel operations come with a set of constraints that are not widely applicable to sequential collections, which leads to the situation that not all the collections support par (e.g. there is no ParTraversable) and more importantly, it imposes a burden on everyone, including people who don’t care for this functionality.

In doing this, Scala violates what I consider a fairly important rule for programming languages and API’s in general: you shouldn’t pay for what you don’t use. Not only do the parallel collections add a few megs to a jar file that’s already fairly big, but they probably introduce a great deal of complexity that is going to impact the maintainers of the collections (both sequential and parallel). It looks like anyone who will want to make modifications to the sequential collections will have to make sure their code is not breaking the parallel collections, and vice versa.

Unproven gains

Scala 2.9 is still very recent so it’s no surprise that we don’t really have any quantitative feedback of real world gains, but I’ll make a prediction today that the courageous developers who will decide to embrace the parallel collections wholeheartedly across their code base will see very little gains. In my experience, inner loops are hardly ever the bottleneck in large code bases, and I’d even go further in suspecting that spawning threads for elements of a loop could have adverse effects (context switching, memory thrashing, cache misses) for loops that iterate on very few elements or that are executing very fast operations already. I’m mostly speculating here, I haven’t run any measurements, so I could be completely wrong.

Remedies

Because of all these problems, I am a bit underwhelmed by the usefulness of the parallel collection framework overall, maybe someone who has a more extensive experience with it can chime in to share the benefits they reaped from it.

I have a couple of suggestions that I think might be a better path for this kind of initiative:

  • Split up the parallel and sequential collections, remove par and make sure that both hierarchies can be evolved independently of each other.
  • Provide a nice Scala wrapper around the Executor framework. Executors have everything that anyone interested in low level parallelism can dream of: configurable thread pool sizes, and even thread pools themselves, thread factories, saturation and rejection policies, lifecycle hooks, etc… You could write a Scala wrapper around this framework in a few hundreds of lines and it would be much more useful than what is currently possible with par.

Thoughts?

Update: Discussion on reddit.

Knowledge casts

There is a lot of great and interesting content on YouTube for anyone interested in learning new stuff. Between keynotes, conference presentations, lectures from professionals to professionals or classes from teachers to students, the amount of things that you can learn is just staggering. The problem is that I’m having a very hard time watching videos on the Internet for long periods of time. Even if I purposely carve out a half hour in my day to watch a video, it never takes long before my mind starts wandering and I get tempted to check email, the end of a compilation or any of the dozens of other distracting things that happen on a computer connected to the Internet.

What’s frustrating is that most of this content wouldn’t lose much without the visuals: the slides are usually more useful for the speaker than for the audience, and whatever is written on them is typically a shorter version of what you are about to hear anyway.

Solving the distraction problem requires finding a better device to play this content, which leads me naturally to my phone. Between commuting, walking the dog or waiting in line somewhere, there are many opportunities to play this content throughout the day with a lot less distraction than when sitting at my computer.

With that in mind, I started looking for a solution to this problem, which I decomposed in three parts.

Converting

The first step is to convert these videos in an audio file (mp3 or other). There are a lot of web sites that provide this service for free and I ended up picking Video2mp3. You enter the URL of the YouTube video you need and a few minutes later, you download the corresponding mp3. This site also offers a Chrome extension that adds a very convenient “Convert to mp3” button to the YouTube page, so converting videos into mp3 has turned into a one button operation.

It’s probably a service that YouTube should offer.

Transferring

Now that I have an audio file, I need to transfer it on my phone. As an Android user, I have a deeply seated hatred for cables (this is the 21st century, people!) so I’d like this to happen over the air. Another concern is disposing of the file: once I have finished listening, I don’t want to have to delete it manually, like I would have to do if I just copied this file on my SD card and played it with a music player.

Do you know what’s perfect for this? Podcasts. What if I could turn this audio file into a podcast so I could play it with a podcast app? Podcast apps perform this kind of management automatically so that I never have to worry about cleaning up. The only remaining problem is that podcast apps usually connect to podcast feeds (as far as I know, happy to be hear about more clever apps), so I would have to set up such a feed.

Setting up a podcast feed turned out to be trivial: it’s a forty line script written in, you guessed it… Perl! Just kidding, it’s PHP, of course.

The script scans a directory and returns the ten most recent files as a podcast RSS feed. Now, all I need to do once I have the audio file is copy to my server in the right directory and I’m done.

Playing

All that’s left to do is ask the podcast application on my phone to sync and the file is magically transferred over the air on my Android phone.

I have been using Car Cast for a few months now (there’s a free and a pro version) and I’m very happy with it, but I’m always interested in hearing about alternatives, so feel free to share what Android podcast application you prefer.

Ideally, I’d like to automate this process even further: having a button on the YouTube page that will automatically convert the file to audio and add it to my podcast feed. Until then, the current procedure seems to be working great and it has already provided me with a lot of content I would never have watched otherwise.

Speaking of suggestions, what kind of content would you recommend for this kind of approach? Recently, I’ve enjoyed listening to the OSCon 2011 keynotes (I especially recommend Bob Lee and Martin Odersky), Stephen Hawking’s “Did god create the universe”? and pretty much anything that Neil deGrasse Tyson publishes, especially his Universe series.

What do you listen to?

I like PHP

There, I said it.

I know it’s fashionable to mock PHP for its antiquated syntax and semantic quirks, but I just like it. Here is why.

PHP is like C

This is really the main point of this post, and it’s a realization so simple that I’m surprised not more people made it. I have often observed that developers mocking PHP tend to be much more positive when I ask them about C.

“C is alright, it’s not OO but it was designed with a simple goal in mind and it does that very well. It’s straightforward and very well documented”.

Does that remind you of another language? That’s right, PHP! PHP is exactly like C. Either you like both or you don’t like either, there is no claim you can make about PHP that can’t be made about C as well, and vice versa. PHP was created with one very simple goal in mind: enabling the insertion of programming logic in web pages. And it does this fantastically well.

The absence of OO functionalities in both languages is a bit of a bummer (PHP 5 tried to address that with mixed success), but with some discipline, it’s really not hard to reach a reasonable architecture for your programs.

PHP never let me down

I write PHP code very sporadically, whenever I need to update one of my web sites or when I need to put together a small piece of web functionality that requires some programming. I am familiar with the PHP syntax but I don’t know much of its API, because I use it so rarely, so I pretty much need to relearn it from scratch each time. Whenever I need to get something done, I spend a decent amount of that time looking up docs on Google. And PHP absolutely shines in this area. It’s not just that looking up a function name will give you its API documentation as a first hit, but you can literally type what you need in English (e.g “most recent file in PHP”) and it’s very likely that you will find how to do it in just a few clicks.

You can even misspell function names (see for example the result of my incorrect query for strtime) and you will still land in the right place.

PHP is robust

I know it sounds silly considering how primitive and old the language is, but the bottom line is that code that I wrote more than ten years ago has been working absolutely flawlessly and without any changes for all that time. I don’t even bother writing tests for most of the PHP I write (obviously, I would be a bit more thorough if this code were destined to be used in a more mission critical web site). Not writing tests is not the only software taboo that I break when I write PHP: I happily mix up presentation and logic all the time. That’s just how PHP is supposed to work, and when the site you are working on is low volume and only of interest to a very tiny fraction of users, you probably don’t want to spend too much time on tasks that look overkill.

PHP’s documentation is great

It’s hard to pinpoint what exactly makes PHP’s online documentation so useful. It’s probably a mix of the content, the syntactically colored code samples, the CSS and most of all, the examples at the bottom. These are absolutely priceless and I don’t understand why not all API documentations do this.

Whether you are looking up an API function or a UNIX command, the first thing you want to see is examples, not a laundry list of its options and switches or a formal definition of its parameters. Very often, reading the example is all you need to carry on with your work and you can always read the more formal documentation if you want to use the function in a more advanced manner.

Universal support

Most Internet providers offer PHP support out of the box, so you don’t need to resort to more expensive VPS providers. I have yet to see this kind of universal support for any other language than PHP. Not even Ruby on Rails, let alone Java, is available on mainstream providers, thereby validating the claim I made five years ago that Ruby on Rails won’t become mainstream (I regularly receive emails about this article asking me this question, and I keep responding “Nope, still not mainstream”).

High reward

There is nothing more exciting than modifying a file, hitting Refresh on your browser and seeing the result right away. This brings me back to the very first days that I started experimenting with a web browser, more than fifteen years ago. Modifying an HTML file and seeing the result almost instantly hasn’t lost its appeal, and PHP is certainly continuing the tradition.

Sometimes, I don’t even bother editing the files locally and then transferring them: I ssh to my server and modify the files live. If something goes wrong, git makes sure that I can always back up my changes very easily. By the way, the combination of git’s branches and PHP is very powerful, allowing you to switch between entire web sites with just one command.

Conclusion

Even though writing in PHP always feels like going back in time, I’m never reluctant to doing it because I know that it will be rewarding and relatively easy. PHP has by far the highest “Get in, code, get out” factor that I have found in a language, and until another language comes around that can do better on this scale, I will be using PHP for many more years to come.

Update: Discussion on Hacker News, reddit and Google+