Archive for February, 2005

Annotation design patterns (part 2)

In a previous entry,
I discussed an annotation design pattern called "Annotation Inheritance". 
Here is another annotation design pattern that I have found quite useful.

Class-Scoped Annotations

This design pattern is very interesting because it doesn’t have any
equivalent in the Java World.

Imagine that you are creating a class that contains a lot of methods with a
similar annotation.  It could be @Test with
TestNG, @Remote if you are using
some kind of RMI tool, etc…

Adding these annotations to all your methods is not only tedious, it
decreases the readability of your code and it’s also quite error prone (it’s
very easy to create a new method and forget to add the annotation).

The idea is therefore to declare this annotation at the class level:

@Test
public class DataBaseTest {
  public void verifyConnection() { … }
  public void insertOneRecord() { … }
}

In this example, the tool will first look on each individual method if they
have an @Test annotation and if they don’t, look up the same annotation
on the declaring class.  In the end, it will act as if @Test was
found on both on verifyConnection() and insertOneRecord().

The question now is:  how will the tool determine which methods the
class annotation should apply to?

There are three strategies we can consider:

  1. Apply the annotations on all the methods (private, protected,
    public).
    Probably not the most intuitive way.
     
  2. Apply the annotations only on the public methods.
    This seems fairly intuitive to me, you just need to be careful what
    methods you declare public.
     
  3. Apply the annotations on a set of methods picked differently.
    An interesting approach discussed further below.

Of course, we should also add another dimension to this matrix:  should
the methods under consideration be only on the current class or also inherited
from superclasses?  To keep things simple, I’ll assume the former for now,
but the latter brings some interesting possibilities as well, at the price of
complexity.

Using visibility as a means to select the methods might be seen as a hack, a
way to hijack a Java feature for a purpose different than what it was designed
for.  Fair enough.  Then how could we tell the tool which methods the
class-level annotation should apply to?

An antiquated way of doing it is using syntactical means:  a regular
expression in the class-level annotation that identifies the names of the
methods it should apply to:

@Test(appliesToRegExp = "test.*")
public class DataBaseTest {
  public void testConnection() { … } // will receive the @Test annotation
  public void testInsert() { … } // ditto
  public void delete() { … } // but not this one
}

The reason why I call this method "antiquated" is because that’s how we used
to do it in Java pre-JDK5.  This approach has a few significant flaws:

  • It forces you to obey a naming convention.
  • It makes refactoring difficult (the IDE’s don’t know much about the
    meaning of the string "test.*").
  • It is not type safe (if the regular expression changes, you need to
    remember to rename your methods).

A cleaner, more modern way to do this is to use annotations:

@Test(appliesToMethodsTaggedWith = Tagged.class)
public class DataBaseTest {
  @Tagged
  public void verifyConnection() { … }

  @Tagged
  public void insertOneRecord() { … }
}

Of course, this solution is precisely what we wanted to avoid in the first
place:  having to annotate each method separately, so it’s not buying us
much (it’s actually more convoluted than the very first approach we started
with).

So it looks like we’re back to square one:  class-level annotations
applying to public methods seems to be the most useful and the most intuitive to
apply this pattern, and as a matter of fact, TestNG users have taken quite a
liking to it.

Can you think of a better way?

Game of the week… kind of

I can’t pass this up… Jon was inspired by the import and macrodef features of ant and he wrote… this.

This is the funniest hack I have seen in a while.

Annotation design patterns

Throughout my work with EJBGen, EJB3 and TestNG, I have identified a couple
of annotation-related patterns that have proven to be quite powerful.  They
are called "Annotation inheritance" and "Class-scoped annotations".

Annotation Inheritance

The idea is simply to extend the familiar inheritance mechanisms to
annotations.  Consider the following annotation:

public @interface Author {
  public String lastName();
  public String date();
}

And an example use:

@Author(lastName = "Beust", date = "February 25th, 2005")
public class BaseTest {
// …
}

public class Test extends BaseTest {
// …
}

If you try to look up annotations on the Test class using the
standard reflection API, you will find none, since inheritance is not supported
by JSR-175 (I submitted the idea but it was decided to keep the specification
simple and leave this kind of behavior to tools, which is exactly what we are
doing right now).

A tool using this pattern would therefore see an @Author annotation
on both BaseTest and Test.

Since we follow the same overriding rules as Java, inheritance would also
work on methods that have identical names and signatures.

Where things get interesting is when you start considering "partial
inheritance" (or partial overriding).  Consider the following:

@Author(lastName = "Beust", date = "February 25th, 2005")
public class BaseTest {
// …
}

@Author(date = "February 26th, 2005")
public class Test extends BaseTest {
// …
}

This time, the class Test is overriding the @Author
annotation but only partially.  Obviously, the date attribute in
the @Author annotation will return "February 26th, 2005", but what is
the value of name?  Should it be null or "Beust"?

My experience seems to indicate that while not necessarily the most
intuitive, the latter form (partial overriding) is the one that is the most
powerful.  Partial overriding is a very effective way to implement
"Programming by Defaults", which is a way of saying that you provide code with
defaults that do the right thing for 80% of the cases. 

Basically, all you need to do to provide these defaults is to store these
annotations on a base class and require client code to extend these base
classes.  Clients are then free to either override already-defined
attributes or add their own, and the tool will gather all the attributes by
collecting them throughout the inheritance hierarchy, starting from the subclass
and working its way up to the base classes.

In a next entry, I will describe the Class-Scoped Annotation pattern, and
more importantly, how it can be combined with Annotation Inheritance to create
some very elegant constructs.

 

PHP confessions from a Java fiend (part 2)

Read part 1.

When you are putting together a Web site, there are two things you need from
a language:

  • Database access.
  • Web support.

As far as I can tell, PHP’s support for the former is adequate but the Web is
definitely its forte.

I can only talk about PHP’s support for MySQL, but support for other
databases is probably not very different.  As a friend of mine told me not
long ago, "there are not a hundred ways you can retrieve rows from a database".

The pair PHP-MySQL is actually so popular that it’s very likely that if your
ISP supports PHP, they probably installed the MySQL extensions with it, and a
quick way of telling is by invoking phpconfig() and look for "MySQL" in the
result page.

MySQL support is pretty much identical to JDBC:  very low level, you
name columns directly and you reference results by ordinal number.  And
just like JDBC, you need to remember to close the connection when you’re done:

$resultRow = mysql_query($query);
$rowCount = mysql_numrows($resultRow);
for ($i = 0; $i < $rowCount; $i++) {
  $name = mysql_result($result, $i, "name");
  $date = mysql_result($result, $i, "date");
}

I am sure there are numerous packages built on top of this simple abstraction
but I haven’t done any research yet, and I am purposely trying to keep things
very basic with my code (hence no class or other object-oriented features of PHP
for now, although just using classes would already help separate neatly the
various layers of my application).

The only principle I have found helpful so far is to centralize all the
database-oriented code in one single file, and avoiding to use hardcoded strings
to reference anything in my schemas.  Having said that, I can already
envision some future maintenance nightmare…

Let’s turn to Web support now, which is where PHP really shines.

There are three areas of particular interest to Web developers:

  • Forms
  • Cookies
  • Sessions

And in the three areas, PHP is an example of simplicity.

Consider the following form:

<form action="post.php">
  <input type="text" name="date" />
</form>

You collect the value entered in the text field in post.php like this:

$date = $_POST["date"];

Of course, you would use $_GET if that’s the action you are using instead.

Cookies follow a similar pattern:

setcookie("user", "cedric");

// …

if (isset($_COOKIE["user"])) {
  $user = $_COOKIE["user"];
}

Sessions are stored in an array called, unsurprisingly, $_SESSION.  You
can have one started automatically by PHP or do this explicitly with
session_start()
.  Of course, the same warnings as in J2EE apply, such
as making sure you keep the number of variables in your session to a minimum
(you can unregister variables with session_unregister()).

If you can put aside the mildly annoying asymmetry in the API (sometimes you
invoke a function, other times it’s a global array), PHP puts a lot of power in
your hands with these simple API’s, and making changes involving an alteration
of a schema and the accompanying change in the business logic and the HTML can
often be made in less than ten minutes.

The next task I’d like to tackle is to research a higher level of abstraction
than what I have been looking at so far, such as template frameworks and
database abstractions.

 

Game of the week

See how many levels you can finish before your eyes blow up…

Play!

PHP vulnerability

Frank Bolander posted a
thoughtful
comment
on my
previous PHP entry
:

Its allure as an alternative/proxy to ASP/JSP makes everyone blinded IMO
just because of GPL. It’s pretty sad when a server side scripting engine
will allow Perl statements to be injected in GET parameters and cause major
damage after all the years of use and hype.

I am well aware of the scalability issues of a 1-tier solution and of PHP’s
security risks, which, as Frank points out, have made the news recently. 
I’m not particularly worried about the Web site I’ve been working on, which
receives very little traffic, but I started wondering.

What if I renamed all the pages ".asp" instead of ".php"?

Basically, the question I’m asking is:  how do hackers target PHP sites? 
Is there any other means to guess that a page is generated by PHP except for its
suffix?  Are there any HTML formatting rules that give away the CGI
language in which this page was generated?

Or do hackers just slam random pages with well-known GET and POST exploits
and see what happens?

 

PHP confessions from a Java fiend

I spent that last few days revamping a Web site and I took this opportunity
to learn PHP, which has been an interesting experience.

This Web site contains about a thousand different HTML pages which I wanted
to store in a database in order to make it easier to browse.  My first task
was therefore to scrape this HTML in order to extract its meaningful content and
then to store into a database.

When I started this Web site six years ago, I had no idea I would ever need
to do something like this but I still followed the convention of surrounding the
information of importance with <span> tags.  This turned out to be of
critical importance.  I wrote a short Ruby script that did the parsing and
extracted the data into a canonical format that I later used as the central
repository from which to populate the database.

The next step was to set up Apache and MySQL to my liking, which turned out
to be a little more challenging than I had anticipated, because what I have
access to on my development machine is different from what my ISP lets me
modify.  But I’ll save that for a future entry if there’s interest and I’ll
focus on PHP for now.

Picking PHP was a no-brainer.  First because it is supported by my ISP
but also because I had always wanted to learn it and find out what all the buzz
was about.  I expected the experience to be painless and… 
surprisingly, it was.  Way beyond my expectations.

Here are a few
thoughts from the perspective of a Java programmer who has been heavily exposed
to J2EE for almost five years now.  Since these reflexions are based on a
PHP experience that is hardly just a few days old, it will most likely contain
inaccuracies that you should feel free to point out in the comments.

PHP is a very simple imperative language with an impressive amount of libraries. 
Even though it possesses a few object-oriented attributes, I chose to ignore
this aspect of the language in order to see what the code would look like if I
didn’t try to be too fancy, a habit that’s shockingly hard to shake off after so
many years of J2EE work.

PHP’s main strength is its very regular syntax and a few details that make it
extremely well suited for the Web, among which:

  • Strings can contain newlines, so you can embed big pieces of HTML into
    your code (not the most readable way to proceed, but awesome to reach a
    working prototype very fast).
  • String can be delimited with either double quotes or single quotes, and
    of course, the latter should be preferred since double quotes tend to come
    up quite often in well-formed HTML.

Not surprisingly, developing with PHP is very similar to JSP:  you end
up concatenating pieces of static HTML with dynamic PHP and this speeds up
prototyping quite a bit.  The problem is that once it works, you tend to
think twice before refactoring it because errors with missing or extra
delimiters are quite common, so in order to make it easy to debug, make sure you
set display_errors = true in your php.ini.

There are two PHP idiosyncrasies that Java programmers will most likely trip
upon:

  • Variables need to start with a dollar sign.
  • Globals are not available by default inside functions.

This first point was actually pretty easy to get used to, but globals still tricks me now and then.  For example:

$URL = "http://a.com";

function foo() {
  echo $URL;
}

will print an empty string.  Yup, not even an error (maybe this is
configurable in php.ini, I didn’t check).  The correct
code is:

$URL = "http://a.com";

function foo() {
  globals $URL;
  echo $URL;
}

This idiom will look familiar to those of you who used to program in TCL,
which had even more nebulous scoping rules.

Another thing I found out the hard way is that PHP doesn’t have any notion of
name space, so it took me quite a while to figure out why the following code
didn’t work:

function log($msg) {
   echo "[LOG] $msg";
}

The reason is that this function collides with the log function from the
standard library and that not only does PHP decide to favor the other one, it
also won’t let you know of such a collision.  This was a clear message to
me that I should invent my own namespace, and I therefore decided to prefix all
my methods with "cb" (I’m still unclear on which style is the best: 
cbConnectToDataBase() or cb_connectToDataBase()).

In the next installment, I will discuss the PHP MySQL API and how fighting
ten years of good software and OO practices are hard to shake off, even though
they’re not exactly easy to achieve with PHP.

 

The old emacs/vi debate

I’m a fan of both vi and emacs (and Eclipse too), which I still use on a
daily basis.  I’ve been using emacs for more than fifteen years so I know
it pretty much inside and out and I fall back to it to edit anything that is not
Java and more than a few dozen lines. 
This posting about
vi
reminded me that in terms of macros, emacs still has a formidable edge
over vi (and pretty much any editor I can think of, actually).

But instead of empty words, here is a concrete task I had to do recently.

I have a bunch of HTML files named 100.html, 101.html…  199.html. 
They all contain something that looks like this:

<span class="number">
100
</span>

<span class="author">
Buffy Sommers
</span>

<span class="text">
Arbitrary HTML
that can span several lines.
</span>

I want to extract the content of the span tags and put them into a file in a
canonical format (eventually a .ddl file for insertion in a database, but it
doesn’t really matter).

Emacs allowed me to accomplish this task within a few minutes with macros.  Can you
think of another tool that will let you do that?  (without writing a
script, which takes more than a few minutes anyway).

 

Ant and Maven

                                

I was going to post a comment on Dion’s blog about his
entry on Maven
when I realized that Mike posted it for me…

In short, ant‘s <import> and <macrodef> are absolute life savers.  They
have brought a lot of sanity into my build files, which I thought were already
pretty lean and mean:

  • All my build files have shrunk a lot thanks to <macrodef> ("extract
    method").
  • All the targets contain fewer commands, and they clearly state their
    intent thanks to well-chosen names ("keep methods short").
  • The macros contain a set of sensible defaults that make most of their
    invocation straightforward and with rarely more than a couple of parameters
    ("program for the default case").

These rules of thumbs coupled with the following simple guidelines:

  • Everything that may vary from an installation to another should go into
    a build.properties, all the rest goes into your build file.
  • Break down your build files logically (e.g. I like to have all the
    macros defined in build-macros.xml and <import> it everywhere I need it).

… give me a feeling of empowerment and control over my infrastructure.

Another important point is that I only need to know two languages to find my
way in ant (Java and ant‘s XML) while Maven requires me to
learn four different languages:  Java, ant‘s XML, Maven’s XML
and…  gulp… Jelly.

I still like the idea behind Maven but even today, when I see that the same
criticisms we were hearing two years ago still crop up on a regular basis, it
doesn’t make me very confident on my ability to diagnose Maven meltdowns.

Announcing TestNG 2.1

I am happy to announce the availability of
TestNG 2.1
.  Some of the new features include:

  • invocationCount and successPercentage, which I
    described in a
    previous entry
    , and which allow you to invoke a test method a certain
    number of times while allowing some of these invocations to fail.  If
    the number of failures is under a certain threshold, the test is still
    considered a success.
     
  • timeOut is now applicable to all test methods. 
    Whether you are running your tests in parallel or not, you can specify a
    time-out for your test method and if it fails to complete within the given
    amount of time, TestNG will mark it as a failure.
     
  • dependsOnMethods was the most requested feature. 
    You can now specify dependencies on a method-based basis (no need to specify
    a group if your dependency graph is simple).  You can even mix
    dependsOnMethods
    and dependsOnGroups.
     
  • … and of course, numerous bug fixes and other additions.

A special thanks to Alexandru Popescu who has pulled all-nighters to make
this release happen!

We have an exciting list of new features lined up for our next version, among
which a plug-in API, but in the meantime, enjoy TestNG 2.1,