Archive for January, 2005

Open letter to James about Groovy

Mike just posted what appears to be a

death sentence for the Groovy project
, and it was a very sad read for me
because while I like Groovy so much, I can’t do anything else but agree with his
assessment.

First of all, please note that Mike put money where his mouth is:  since
his

previous rant on Groovy
about a month ago, he has been very active on the
Groovy mailing-list and he has tried hard to piece together a decent set of
documentation for Groovy Classic.  I suppose that his post today is the
observation that this effort failed and the confirmation of his worst fears
about the future of Groovy.

But you know what, James?  There is still hope.  There is one very
simple way you can prove Mike and countless other disillusioned Groovy fans
wrong about their fears:

Announce a date by which you will ship Groovy 1.0

It’s that simple.  Really.  Everything else will fall into place.

Once you have a ship date, you will start looking at your work on Groovy very
differently.  Everything will become a matter of compromises between the
importance of the feature and the necessity to hit the deadline.  The
roadmap will also appear to you much more clearly, starting backwards: 
plan a beta one month before the deadline, an alpha two months before, a few
milestones here and there (not indispensable but I’ve found that milestones keep
you honest and give you a good idea of your velocity).  You will also be to
make the best use of the various volunteers who offer their help.

Alright James, it’s your turn now.  Pick a date.  Any date.

Groovy deserves it.

 

Why I prefer SAX to parse XML

There are numerous ways to parse XML in Java but they are all based on one of
the two technologies:

  • DOM
  • SAX

I’m not going to explain what these two API’s do exactly, there are plenty of
articles on the subject, but in a nutshell, DOM gives you a tree view of your
XML document, which you can then navigate by moving from one node to the other,
while SAX is event-driven and will call your code whenever it encounters a tag.

Over the years, I have come to developa strong liking for SAX despite its
apparent limitations, and now, it’s reached a point where I haven’t needed to
resort to DOM for a long time, and here is why.

The thing I like most about SAX is that it allows you to ignore all the
portions of your XML document that you don’t care about, making it not only
trivial to only pick the information you are interested in, but also easier to
migrate your schema over time, should you decide to do so.

Consider the following XML document:

<person>
  <first-name value="Cedric"</first-name>
  <last-name value="Beust"</last-name>
</person>

Extracting the first and last names is straightforward:

 public void startElement(String uri, String localName, String qName,
Attributes attributes)
    throws SAXException
  {
    String name = attributes.getValue("value");
    if ("first-name".equals(qName)) {
      System.out.println("First name:" + name);
    }
    else if ("last-name".equals(qName)) {
      System.out.println("Last name:" + name);
    }
}

Note that the code above is completely ignoring the <person> tag and it focuses
exclusively on the content we are interested in.  If we have reached this
point in the code (which is defined in a ContentHandler), the parser has
probably already verified the validity and well-formedness of your
document.

Of course, this code won’t work if the same tags appear several times in the
document:

<project name="TestNG">
  <members>
    <person>
      <first-name value="Cedric"</first-name>
      <last-name value="Beust"</last-name>
    </person>
    <person>
      <first-name value="Alexandru"</first-name>
      <last-name value="Popescu"</last-name>
    </person>
  </members>
</project>

or, even more tricky, if these tags have different parents:

<project name="TestNG">
  <members>
    <vampire-slayer>
      <first-name value="Buffy"</first-name>
      <last-name value="Sommers"</last-name>
    </vampire-slayer>
    <vampire>
      <first-name value="Angel"</first-name>
      <last-name value="Angelus"</last-name>
    </vampire>
  </members>
</project>

A typical way to solve this is to keep track of the parent tag:

private VampireSlayer m_vampireSlayer = null;
private Vampire m_vampire = null;

 public void startElement(String uri, String localName, String qName, Attributes attributes)
    throws SAXException
  {
    String name = attributes.getValue("value");
    if ("vampire-slayer".equals(qName)) {
      m_vampireSlayer = new VampireSlayer();
    }
    else if ("first-name".equals(qName)) {
      if (null != m_vampireSlayer) {
        m_vampireSlayer.setFirstName(name);
      }
      else if (null != m_vampire) {
        m_vampire.setFirstName(name);
      }
    }
// …

Don’t forget to "pop out the context" when you exit the tag:

 public void endElement(String uri, String localName, String qName)
    throws SAXException
  {
    if("vampire".equals(qName)) {
      // store the vampire somewhere
      m_vampire = null;
    }
    eles if("vampire-slayer".equals(qName)) {
      // store the vampire slayer somewhere, then
      m_vampireSlayer = null;
    }

However, the problem with this approach is that the business logic attached
to a certain tag is now scattered in two different places, which makes the
code hard to maintain, so I have adopted the following rule:  whenever I
need to run code both at the start and at the end of a tag, I move the business
logic in a method that takes a boolean indicating if we are opening or closing
the tag:

 public void startElement(String uri, String localName, String qName,
Attributes attributes)
    throws SAXException
  {
    String name = attributes.getValue("value");
    if ("vampire-slayer".equals(qName)) {
      xmlVampireSlayer(true /* start */);
    }
// …

 public void endElement(String uri, String localName, String qName)
    throws SAXException
  {
    if("vampire-slayer".equals(qName)) {
      xmlVampireSlayer(false /* start */);
    }
// …

  /**
   * @param start If true, we are looking at a opening tag (e.g. <foo>),
   * otherwise, we are looking at a closing tag (</foo>)
   */
  private void xmlVampireSlayer(boolean start) {
    if (start) {
      m_vampireSlayer = new VampireSlayer();
    }
    else {
      // store the vampire slayer somewhere, then
      m_vampireSlayer = null;
    }
  }

And now we have the
best of both worlds: code that is not only easier to read but also quite robust
in the fact of schema changes.

Now, imagine a more complex situation where
your XML file can have tags nested six or seven levels deep.  One day, you
need to add a new tag.  With DOM, you would have to locate the code that is
walking this particular area of the tree, and even with typed tree-based
solutions such as XMLBeans, locating and modifying code is never easy.

With SAX, all you need to do is two things:

  • See if the name of this tag is unique within your file (if not, you will
    need to disambiguate it with the context approach shown above).
  • Implemt the method xmlTagName(boolean start) and gather its treatment
    inside.

How about you?  Do you prefer DOM over SAX?  Have you encountered
situations where DOM was a much better fit than SAX?

iPods may be hazardous to your health

I cracked a rib this past Sunday.  I was happily snowboarding on a
decent surface of snow which suddenly turned into hard-packed bumpy ice. 
The mix of shade and sun at this very location didn’t leave me any chance. 
My snowboard disappeared from under me and I fell flat on the chest. 
Pretty hard.

I made the mistake of playing a squash league match the next evening, which
aggravated the injury.  By the end of the match, the simplest twisting
motion of my upper body left me with a gripping pain that took a few minutes to
recede.

Looking back, it occurred to me that the real reason for my injury is
probably…  my iPod.

I usually listen to music when I snowboard but only recently did I realize
that when people tell you that snowboarding with headsets on might be hazardous,
they are simply missing the point.  What’s really dangerous is anything
that protrudes sharply from your body.

Am I going to stop snowboarding while listening to music?  Hell no, it’s
just too good.  And the iPod clearly proved that it is up to the task.

But for now, please don’t make me laugh.

 

XML is not human-editable


This post
brings up a few interesting comments about XML:

XML seems to me overhyped. it is *just* a container for data structuring it
in most cases.

Saying that XML is overhyped is a bit like saying that text files are
overhyped.  The thing is, before XML became a standard, we had a flurry of
text formats used to contain various external information that programs need
(take a look at sendmail.cf or the hundreds of different configuration files
used by any UNIX installation for example).  You never knew what to expect
if you tried to read or edit one.

At least, XML gave us a suite of tools that make editing and reading such
files easier.  It also gave us a wealth of API’s in all languages to avoid
reinventing your own parser, another Good Thing.

In that same comment, Klaus adds:

Besides that it is in fact *not really* human-editable.

… and this I fully agree with.  This is a message I have been pushing for
years but I’ve had a very hard time convincing people.  My
early dislike for XML as an editable format was what prompted me to create EJBGen in the first place, which was one of the very early tools that used
annotations to replace XML (EJBGen started in
early 2001, and its immediate success
showed that I was not the only one having a problem with XML as an editable
format).

It is pretty obvious to me now that as soon as you are creating more than
just a toy program and that your code needs to store data outside the code, XML
is the only sane way to go.

So the question is not really "Why is TestNG using XML?" but "Why is TestNG
using an external file to configure its tests?", as opposed to JUnit where this
is done in code.

The answer is that I make a clear distinction between the static (the
business logic) and the dyamic (what tests are being run) part of your tests. 
I believe JUnit makes the mistake of conflating the two, which forces you to
recompile your code when you decide to run a different set of tests.

If you picture a team of ten programmers, each of them will want to run a
different set of tests as part of their day, and all the time spent recompiling
their test suites is a waste of time.  Not mentioning that they are
modifying code that they need to remember not to submit to the source control
system, since it only runs a subset of all the tests.

Bunny suicides

Even bunnies get tired of life

Outlook finally virus free

I can’t begin to describe how happy this exquisite dialog made me feel this
morning.  I was synchronizing my
Nokia 6620 with Outlook (more
on this phone very soon) and I was
wondering why the process was taking so long, until I realized that Outlook was
waiting for me to grant access to my address book.

It’s taken a while, but I have to chalk this up to Microsoft.  Now, if
only Windows required a password before installing any new application, I could
finally ditch all my anti-junkware programs (and if you are an Apple fan and you
want to point out that MacOS already does this, please don’t).

Disposable email addresses

Given the amount of spam I receive every day, I am extremely reluctant giving
away my email address to untrusted parties, especially when I’m pretty sure
these people should only ever use that email address once (to send me an
activation code, for example).  Therefore, I was absolutely delighted when
the first disposable email address service appeared (SpamGourmet) and especially
when it was followed by two more (Mailinator and DodgeIt).  Here is a quick
review of these three services.

  • SpamGourmet allows you to define an
    email address @spamgourmet.com with
    the syntax aWord.aNumber.yourUserName
    The word will typically be used to identify the service that will be trying
    to email you, the number is the number of times emails sent to that address
    will be delivered until they start bouncing, and your user name is… well,
    your own identifier.

    This solution works well but is a bit heavy.  First of all, I don’t
    really care how many times this email address works, most of the time, once
    is enough and the rest is up to the server.  Second, I need to log in
    with a user name and a password, and again, I don’t really see the need.
     

  • Enter Mailinator.  Mailinator
    takes the concept one step further by offering you passwordless email
    addresses.  You type in the user name, and you are automatically taken
    to the mailbox.  Obviously, you should never use this inbox for
    anything confidential, and if you can live with that, it’s certainly a
    better solution than SpamGourmet.  It suffers from two shortcomings,
    though:  1) you can’t easily bookmark the inbox, you need to go through
    the front page and then submit your name and 2) there is no easy way to be
    notified when the expected email has arrived.
     
  • DodgeIt to the rescue!  With its
    googleish minimalistic interface, dodgeit appealed to me right away, and the
    fact that it doesn’t use any form allows you to bookmark any inbox (example). 
    But DodgeIt goes further by giving you an
    RSS feed to the
    mailbox.  This is the ultimate luxury in disposable addresses, since
    you obviously don’t want to be notified by email.  Pick a carefully
    selected login name (you don’t want to your reader to bother you if others
    happen to use the same inbox), point your RSS reader to it and voila!, you
    will never have to reload a browser waiting for an activation code.

Can anybody top DodgeIt?

Update from the comments:There is also ipoo as a good alternative to dodgeit.

 

iProduct

Jamie Zawinski’s take on
Apple’s latest announcements…

 

TestNG article on DeveloperWorks

Filippo Diotalevi has just posted an
article
on TestNG on DeveloperWorks
.  It’s fairly short but gives a very good
overview of the features of the framework.

 

Massive spam attack

I have just been hit by the nastiest spam attack yet.  When I got up
this morning, I found more than nine thousand (9000!) emails in my Inbox. 
They all follow the same pattern:

  • They come from a different email address (different domain even).
  • The wording is slightly modified from one email to the other (they are
    selling medicines).
  • They are sent to a randomly-generated email address to my domain.
  • They only contain three lines, so my spam filter was unfortunately
    unable to flag them as spam.
  • The web site they point to seems to be randomly generated but it does
    indeed work and point to a Canadian drug firm:  basdf kjlke.com
    (space inserted on purpose) which I hope will be shut down by the time you
    read this.

I am having a hard time believing this kind of flooding is even effective at
all, but fortunately, it didn’t take me more than ten minutes to clean up my
Inbox.  What a waste of time.