Archive for April, 2005

The definitive Python rant

Unsurprisingly, there have been quite a few reactions to my small Python rant
earlier.  It’s good to see people stand up for their language, that’s what
makes our profession so unique.

Here are a few comments and my reactions to them:

Complete object orientation was built in from the start

That’s not true, see below for more on Python’s history.

It’s true that it allows you to mix object-oriented and script-style code
but I really don’t see what’s wrong with that, since you get the best of
both worlds.

Not always.  I think languages such as Ruby and Groovy definitely enable
both types of programming and offer the best of both worlds, but Python has made
a lot of compromises when adopting various styles of programming, and my
experience with Python has been less about choosing between two great
alternatives than picking the least of two evils.

By the way your code has 4 "it" and I’ll let you count the different
semantics to yourself.

I already did:  there are two.  One to declare the parameter and
the other one to use it inside the closure.  And that’s how it should be.

I think it’s just a matter of what you’re used to. For me as a Python and
Java programmer, the Ruby code you show looks just as unnatural and weird as the
Python code looks to you.  But maybe that’s just because I’m not used to
seeing Ruby, just as you’re not used to seeing Python code.

Fair enough.  For the record, I read a couple of Python books and I do
see a decent amount of Python code every day, so I’m certainly used to reading
it (not as much writing it, though).  But my dislike for
Python is caused by more than the odd for syntax I commented on in my previous
entry, but let me elaborate on that.

C offers a for loop similar to Python’s (a bit less powerful actually) which
is so flexible that pretty much any language that appeared after C provided a
similar construct.  Having said that, I still think that C is more elegant than
Python because it’s more consistent.

The problem with Python is that it came out at a time that turned out to be
pivotal in software history.  The software world was slowly realizing the
power of several programming paradigms (imperative, functional, declarative,
etc…) and set out to explore them all through different languages (C++,
Modula, Eiffel, Haskell, Prolog, etc…).  Python started as a very basic
script language that mimicked the already bare-bone syntax of C while doing away
with most of its type safety (a layer was pretty thin to start with).

Even
though object-oriented concepts were slowly emerging as a must-have for
industrial software programming, the idea of
including them in Python made as much sense as it would to add templates to
Ruby, so creating Python without any of these advanced features made
perfect sense (Python didn’t even have support for static initially!).

But things changed and some of these advanced features turned out to be not
only necessary but essential for any language that pretended to be of industrial
caliber.  So Python tried to adapt and started to incorporate a mishmash of
features from various origins.

Unfortunately, Python wasn’t designed to grow.  It didn’t follow the
recipes laid out by Guy Steele’s seminal paper
Growing a language
, and as such,
the inclusion of all these features took a heavy toll on Python’s syntax
(sometimes acceptable) and Python’s consistency (much worse).

For example,
static was retrofitted in Python and must now be achieved like this:

def say_hello(cls):
print X.message
say_hello = classmethod(say_hello)

That’s right:  not only do you make a method static by invoking a
magic function on it, you actually need to reassign the method to the value
returned by this magic method.

I’m sure there are very good technical reasons
for this kind of wart, but that’s exactly the problem I have with Python: 
these reasons were obviously motivated by the complexity that Guido van Rossum
had to face in order to incorporate these new features, and not really aimed at
making the language simpler for Python users.  I can’t imagine there would
be any other reason (readability?) than Guido having a problem adding a keyword
to his language, or coming up with another less awkward syntax (Ruby does it in
a creative way, but I still prefer the static keyword approach).

And this was just to add static.  Retrofitting
more complex object-oriented language features to Python has been even more problematic, and
the ubiquitous and useless self keyword is just the tip of the iceberg
(take a look at how Python implements private/protected or how
accessors work).

The same can be said about Python’s mixed support for features coming from the
functional world.  Generators, lambdas, closures, continuations, etc… 
are all implemented with strange restrictions that make guessing the right
behavior or syntax almost impossible (to such an extent that some of these
features are actually being considered for removal, which is probably the worst
thing you can do to a language).

If you are interested in more details about my feelings about Python, here
are a few
older
entries
I
wrote on
this topic.

Anyway.

At the end of the day, elegance is something that just cannot
be argued because everyone has different criteria to define it.  Throughout
the years and after studying quite a few languages, I have reached a point where
languages need to possess a certain set of features in terms of syntax and
semantics to get my attention.  Over and over again, I have tried hard to
like Python because, frankly speaking, its momentum is undeniable.  But it
just never clicked, while my attraction for languages such as Lisp, Java, Ruby
and more recently, Groovy, happened within a matter of hours of tinkering.

Wherever your preferences lie, keep your mind open and learn at
least a new language every year.  It will make you a better developer.

Python keeps rubbing me the wrong way

In response to Luke Hutteman’s post on continuations, someone offered the
following Python snippet to illustrate filtering:

values = range(10)
for nr in (nr for nr in values if nr%2==0):
    print nr

There are several things that bother me with this code:

  • I count four different meanings for the variable nr, each with a different semantics. You should not need more than two (one as a parameter to for and the other one used inside the body)
  • It mixes procedural, and inverted style and doesn’t contain anything that is object-oriented.  Python does support some object
    orientation, so for more complex pieces of code, you will find yourself constantly mixing three different styles of programming.
  • It mixes filtering (nr % 2 == 0) and business logic (print), making it
    hard to parameterize either.

Here is a Ruby way of doing the same thing:

(0..10)
  .find_all { |it|
    it %2 == 0
  }
  .each { |it|
    puts it;
  }

What I like about it:

  • It’s object-oriented, which makes it regular:  each object is applied a
    message and the result is then piped into the next treatment.
  • Filtering and business logic are clearly separated, so you can improve this
    example by substituting blocks (strategy pattern)
  • It minimizes the number of intermediate variables (only one:  it and it is only used as a parameter and inside the body of the closure.

The latter solution feels much more natural and intuitive to me, whereas most of my time in
Python is typically spent

  • Filtering out the omnipresent and totally useless keyword self.
  • Wondering if the object I am staring at responds to a method or if I need to call a global function on it
    (such as range in the example above).

I guess I’m too young for Python and spoiled by modern languages 🙂

More on continuations

Don Box offered a few answers to my questions in his
latest post, but I am
still unsatisfied.  Here’s why.

First of all, the various comments on our respective blog entries taught me
that C# doesn’t really support continuations, which explains the awkward
return yield
syntax.  This particular C# feature
is probably closer to generators, which are themselves a more restricted form of
closures (but still more useful than anything we have in Java).

Don’s answer to the question "how useful are continuations?" is the
following:

I’ve been programming in C# 2.0 for over a year now. I regularly find myself
using the following two pre-defined delegates from mscorlib in my programs:

namespace System {
public delegate bool Predicate<T>(T item);
public delegate void Action<T>(T item);
}

I am quite puzzled by this code snippet which doesn’t seem to have much to do
with continuations.  Don’s remaining text is a pretty convincing argument
of the usefulness of delegates in C#, something I am in full agreement with. 
Years ago, when Microsoft came up with its own JVM and IDE, my fondness for
delegates was immediate.  They appeared to me as a typesafe and reasonably
object-oriented way to provide method callbacks.

Back then, I spent hours on emails and discussions trying to convince Java
developers around me (including at JavaSoft, and more particularly on the Swing
team) that we should have delegates in Java.  But
my words fell on deaf ears, creating cascades of flame wars that never
accomplished anything.  The problem
was, of course, that the reactions were clouded by politics and not driven by
pragmatism.  So Java drifted away from delegates and will probably never
recover.  This is quite sad and probably the cause for thousands of wasted
objects every day as we create and destroy interfaces for every click of a
button.

Anyway.

Don shows a pretty good example of predicates and first-order logic that
allows him to neatly isolate logic between callers and callees. It’s a fairly
common idiom that you also find in many places in Java (comparators, file
filters, etc…).  If you are not familiar with this kind of trick, I strongly
encourage you to read up on the STL
(Standard Template Library) which, even though it is written in C++ and makes heavy use of
advanced template techniques, contains a lot of very important concepts that you
will undoubtedly find useful in your daily Java programming.

So I’m back to my original question:  how useful are continuations,
really?  I can’t shake off the idea that they are nothing more than a
glorified way to do goto.  A cleaner way, sure, since it remembers contexts
and frames, but I am still looking for this one example where a
continuation-based code is more readable than a loop-based one.

Will anyone take up the challenge?

 

Are you saying you’re lazy?

It’s not very often that Scott Adams makes a factual mistake, so the
opportunity is too good to pass up.

Here is today’s cartoon:

Of course, the number of possible combinations for twenty-five numbers is not
25*25 but 25! (factorial of 25)

A friend pointed out that 625 is the right number if you need to try these
combinations in pairs, so I’ll let Scott get away easy for this time.

 

The Return of the AOP Caching Challenge!

An

interesting article on caching with Aspect-Oriented Programming
was just
published on TheServerSide, and while it does a decent job at benchmarking and
describing the infrastructure, I have a few issues with some of the
aspect-related material it covers.

Here are a few comments:

it’s not easy to turn caching on or off dynamically when it’s part of your business logic

It should be configurable externally.  You don’t need AOP to branch
conditionally and disable (or alter) your caching logic at runtime.  Most
of the EJB and web containers that I know have been providing this kind of
functionality in XML files for quite a while.

it’s not easy to turn caching on or off dynamically when it’s part of your business logic

True, so it’s quite surprising that Srini’s own solution still falls in this
trap anyway (see below).

The cached data is released from memory (purged), by implementing a
pre-defined eviction policy, when the data is no longer needed.

I disagree with the "pre-defined" (sic) part.  Eviction policies should
absolutely be configurable at runtime, even more so than caching activation
itself.  Adjusting the eviction policy is a big part of fine-tuning and
optimizing an application, and you need as much flexibility in terms of
strategies (round-robin, last used first, timeouts, evict biggest first, etc…)
as possible.

Except for these points, Srini does a good job at framing the overall problem
and he makes a convincing case to use AOP for caching.  However, caching
with AOP is a very complicated thing to achieve, and a couple of
years ago, I offered an AOP
caching challenge
that turned out to to be much harder to solve than
everybody thought initially (including myself).

Srini’s pointcut is the following:

List around(String productGroup) : getInterestRates(productGroup) {

The problem with this approach is that it explicitly
references a method in the business code.

Not only is this dangerous
because you are increasing the coupling in your code (and I’m assuming that
refactoring will take care of modifying the aspect, should you decide to rename
or modify the getInterestRates() method), but it’s actually impossibly to scale. 
As the number of methods you want to cache increases, you need to remember to
update the pointcut to include the newcomers, and this will clearly fall apart
very quickly.

Srini is falling in the same trap as the people who tried to solve the AOP
Caching Challenge fell into:  not enough abstraction, too much coupling.

As Srini said himself above, caching is completely independent of domains,
and this fact should be reflected in the pointcuts you use.  The above
pointcut is not independent from the domain model it applies to.

You should be
able to determine a trait that "methods that can be cached" share and use this
as your pointcut.  I can think of two ways to solve this problem:

  • Decide that any method that takes a string as a key and returns a value
    can be cached (potentially dangerous since you could get false positives,
    but this could be alleviated with naming conventions).
  • Use annotations to indicate when a method can be cached.

I think the annotation-based solution is the best compromise in this case,
since it makes you independent of naming conventions and doesn’t require any
modification of your pointcuts as your code base grows.  Also, the burden
on developers is minimal since all they need to remember is to add an annotation
whenever a method can be cached.

You can also imagine more annotation schemes that would allow for a better
partitioning of your caching:

@Cacheable(category = "datasources")
public DataSource getDataSource(String driverName);
@Cacheable(category = "db.accounts")  // "use the cache for rows in table ACCOUNTS"
public Account findAccount(String customerName);

Jonas and Alex, from AspectWerkz, and Ramnivas Laddad, the author of "AspectJ
in action" have published a
series
of
articles on annotation-based AOP with AspectJ/AspectWerkz
which I strongly recommend.

Regardless, this is an interesting contribution to the problem of AOP-based
caching in general, but it goes to prove — again — that even two years later,
we still haven’t quite figured how to solve this problem optimally.

Continuations: still not quite convinced


Sam Ruby
and Don Box have posted a couple of interesting articles on
continuations.  Sam’s article is a good explanation of what continuations
are for "old-timers" and gives a few examples in various domains.  However,
I was more intrigued by
Don
Box’ post
because it taught me that C# supports continuations, which I
didn’t know.

It’s quite refreshing to see a language take a few risks and implement
innovative features.  Time will tell if these features will find their
place in developers toolboxes, but right now, I will take this opportunity to
express a few doubts on the concept.

Don gives two examples, one written with continuations and one using
anonymous classes (delegates, actually, since C# supports those).  And the
first thing I notice is that both examples are about the same size and equally
readable to me.  This is not good for continuations, since I’m a firm
believer that if you introduce a new feature in a language, it needs to improve
at least one aspect of that language (readability, concision, performance,
etc…) radically.  If the new feature fails to achieve this goal, you now
have two slightly different ways to achieve the same thing, and Perl has already
taught us that this leads to the path of madness.

Something else that distresses me about continuations is that they are often
illustrated with Fibonacci or Web flow control.  These examples are quickly
turning into what logging is to AOP:  the quintessential example that
everybody understands but nobody can apply to their day job.

But here is the main problem I have with continuations:  how do you
debug them?

Out of curiosity, here is how you could implement Fibonacci in Java. 
First as an iterator:

class FibonacciIterator implements Iterator {
private int m_previous0 = 0;
private int m_previous1 = 1;
public void remove() {
// not implemented
}
public Object next() {
int result = m_previous0 + m_previous1;
m_previous0 = m_previous1;
m_previous1 = result;
return result;
}
public boolean hasNext() {
return true;
}
}

Implementing this as an Iterator makes it possible to use it in the
new for loop:

public class FibonacciContinuation implements Iterable {
public Iterator iterator() {
return new FibonacciIterator();
}
}

such as:

public static void main(String[] argv) {
int n = 10;
FibonacciContinuation fib = new FibonacciContinuation();
for (Object o : fib) {
System.out.println(o);
if (n-- <= 0) break;
}
}

which outputs:

1
2
3
5
8
13
21

This example is admittedly a little bit more verbose than Don’s, but it’s
because I wanted to make it fancy, and I contend that it has a big advantage
over a continuation-based implementation:  it can be debugged.

Imagine that your Fibonacci code has a bug and starts producing bogus values
after iteration 1057.  How do you trace the program there and how do you
inspect it, since all the state is implicitly maintained by the JVM (or whatever
runtime you are using)?

With the field-based approach to continuations, I get to decide and to define
what the state is, so that I can inspect it (and future readers of my code will
as well).

Does this mean that continuations are useless?  I wouldn’t go that far,
but it’s clear to me that Fibonacci is not the right way to advocate this
feature.  There have to be better examples where maintaining the state
explicitly like I did above would be too complex than the alternative (letting
the runtime do it for you) while still allowing you to debug easily through it.

I really want to like continuations, can somebody convince me with a good
example?

The Perils of Duck Typing

The idea behind "Duck Typing", which has recently be made popular again by Ruby and other script languages, is to make the concept of types less restrictive.

Consider the following:

public interface ILifeCycle {
public void onStart();
public void onStop();
public void onPause();
}
// ...
public void runObject(ILifeCycle object) {
object.onStart();
// ...
object.onStop();
}

Faced with this kind of construct, some languages decide that the existence and even the name of the interface ILifeCycle is unimportant.  The only thing that really matters is the fact that runObject() needs the methods onStart() and onStop() to exist on the parameter, and that’s all.

In short, it boils down to:

public void runObject("any object that responds to the methods onStart and onStop" object) {
// ...
}

Late-binding languages are actually even less restrictive than that, since the verification that the object does respond to such methods is not made when the object is passed as a parameter to the method, but on the invocation of the said methods, which explains why parameters to methods are usually not typed
all.

In a way that’s typical for dynamically typed language, the error will therefore only appear at runtime and only if such code gets run.

First of all, let’s get a frequently asked question out of the way:  if two interfaces have the same methods, are they semantically equivalent?  Isn’t there a risk to pass an object that is totally wrong for this method, yet will work because it responds to the right methods?

I don’t have a clear answer to that, but my experience is that such a thing is very unlikely.  This kind of argument is a bit similar to the fear we all felt in the beginning of Java when we realized that containers are not typed:  ClassCastExceptions end up being much more rare than we all thought.

Duck Typing is a big time saver when you write code, but is it worth it?  Don’t you pay this ease of development much later in the development cycle?  Isn’t there a risk that you might be shipping code that is broken?

The answer is obviously yes.

The proponents of Duck Typing are usually quick to point out that it should never happen if you write your tests correctly.  This is a fair point, but we all know how hard it is to guarantee that your tests cover 100% of the functional aspects of your application.

Another danger in the Duck Typing approach is that it makes it really hard to see what the contract is between callers and callees.

As you can see in the code above, you need to actually understand the entirety of the method to realize that the parameter passed to the method needs to respond to onStart() and onStop().  But the worst part is:  the code is lying to you!

The method is also relying on onPause(), except that this method is not used in this particular runObject().  But it is used in execute() in a different class.  How would you realize that runObject() and execute() work on objects of the same type?  With Duck Typing, it’s extremely hard to tell and it requires a detailed read of the code of these methods.

If you wanted to use runObject() from your own code, you would make the flawed assumption that all your object needs to do is respond to onStart() and onStop(), and chaos will ensue if/when the implementation is upgraded to invoke onPause() as well.  At least, with the typed approach, the contract is obvious and you are guaranteed that it can’t be changed from under you (the provider of this interface can’t add a method to ILifeCycle without breaking everything, so they will probably provide an ILifeCycle2 interface or something similar to guarantee backward
compatibility).

I am all in favor of anything that makes the development process more agile, but if I can ship code that contains errors when these errors could have been caught by the compiler before my code even gets a chance to run, I will seriously consider leveraging this support as much as I can.

Duck Typing is dangerous and should only be used for quick prototyping.  Once you switch to production coding, I strongly encourage everyone to make their code as statically typed as possible.

This is one of the great things in Ruby:   it is late-bound but still statically (strongly) typed.  Not only is the interface approach shown in the first code snippet above fully supported in Ruby, it is actually quite encouraged and it doesn’t make your code any less Ruby-ic.

Use Duck Typing for prototyping, but program to interfaces for anything else.

 

Announcing TestNG 2.3

The TestNG team is happy to announce the availability of TestNG 2.3.

The version is available at
http://beust.com/weblog/testng
as well as the new documentation, which has been
considerably improved (highlighted code snippets,
detailed DTD, ant task
and description of all the new features).

What’s new:

  • beforeSuite, afterSuite, beforeTest,
    afterTest
  • Revamped ant task with haltonfailure and other helpful flags
  • Better stack traces and improved level control for verbosity
  • Better syntax for including and excluding methods in testng.xml
  • Test classes can be invoked on the command line
  • … and many bug fixes.

For Eclipse users, a new version (1.1.1) of the Eclipse plug-in that includes
this new TestNG version is available on the
remote update site
or for
direct download.

Also, TestNG has joined
OpenSymphony
(big thanks to Patrick and Hani for setting this up).  As
a consequence of this move, there is now a
TestNG
users forum
as well as a Wiki and JIRA for issue tracking.

The
users mailing-list
has been moved to Google Groups and
is connected to the forum, so you only need to
subscribe to one.

Try it and let us know what you think!

Class-level injection

My post on dependency injection in tests has generated a lot of very
interesting comments and email.

Eugene noted that:

However the huge disadvantage of such approach is that you have to repeat
these declarations for each and every test method. From this point it is better
to have dependencies as fields, because you declare them only once (less coding
and easier to change/refactor).

Very true.  Obviously, both method parameters and fields serve a purpose, but it occurred
to me that TestNG didn’t really help you with fields.  Since I
still have some reluctance to the idea of a container altering my private fields, I tried hard to
come up with a solution that would solve both problems, and it occurred
to me that a natural extension to TestNG would be to allow parameters at the
class level.  These parameters would then be passed in the constructor when
TestNG is creating an instance of your test class:

@Test(parameters = { "xml-file" })
  public class Test1 {
 
    public Test1(String xmlFile) {
      // …
    }
}

This mechanism is already in place for test methods, so TestNG users are
already familiar with it.  With this construct, the developer is then free
to do whatever they want with the parameters passed in the constructor, the most
likely approach being to store it in a field for later use in your test methods.

This approach addresses Eugene’s remark by enabling both "class-level
injection" and "test method parameter injection", but a few hours later, Eugene
offered further refinement on the mailing-list:

Better approach will be to have these "parameters" (actually
dependencies) declared in the constructor. Something like:

@Test
public class Test1 {

  public Test1( @Dependency( "xml-file") String xmlFile) {
    // …
  }
}

Indeed, this is even better, but the problem here is that I want to keep
supporting JDK 1.4 for a little while and QDox doesn’t support parameter
annotations.  But at some point in the future, this is most likely how this
feature will be implemented.

Dependency injection in tests

I came across this old
entry from Ara about dependency injection in tests
.

The idea is to define your beans in XML with a framework like Spring and then
use his decorator to inject the beans inside your tests.  The problem with
this approach can be summed up in three words:  "too much magic".

Ara’s solution uses reflection to enumerate the fields in your test class and
match them against the name of the bean as declared in your XML file. 
Another problem with this approach is that you need to declare this field inside
your class whereas only a few methods might need it, but I agree that JUnit
doesn’t leave you much choice there.

I believe a better solution is simply to pass the resolved bean as a
parameter to the test method.

Ara’s test case can then simply be rewritten like this:

public void testSomething(UserDAO userDao) throws Exception {
  userDao.createAdmin();
}

The advantages of this approach are:

  • No more reflection magic and mysterious naming.
  • userDao is scoped to the method that uses it, which makes for better
    isolation.
  • Uses the standard Java way to pass parameters.
  • No need to declare it as a field.

Now, how do we get the testing framework to pass this parameter to the
method?

It’s pretty easy to do with TestNG, but as of today, passing parameters is
limited to primitive types (no XML bean support such as in Spring), so TestNG
only solves half of this problem.

In the future, I am definitely considering adding support for Spring’s bean
factory so that the limitation to primitive types can be entirely lifted. 
Then we could have:

@Test(parameters = { "user-dao" })
public void testSomething(UserDAO userDao) throws Exception {
  userDao.createAdmin();
}

and in testng.xml:

<parameter name="user-dao" spring-bean-name="user-dao-bean">

The good thing about this approach is that it leverages a well-known and
robust framework, but we now have two indirections (one Java file and two XML
file), so another possibility would be to offer bean support in testng.xml itself:

<bean name="user-dao-bean">
    <property name="userName" value="Cedric" />
</bean>
<parameter name="user-dao" bean-name="user-dao-bean">

Whatever solution we eventually support, I think that passing parameters to
test methods is a very important feature that has been overlooked for too long.