Archive for September, 2003

The problem with “Half Beans”

In an entry titled "Functional style in Java", Brian Slesinsky presents a
technique he
calls "half beans"
.

Never mind the fact that I can’t really see the connection between this and
functional programming, I find his recommendations quite questionable.

Brian is trying to solve the problems of objects that are not quite
completely initialized.  In order to make sure an object is fully
initialized, he forces his class (Album) to have just one constructor that
accepts only one parameter, of type AlbumBuilder.  AlbumBuilder is the
helper class in charge of making sure the object will be fully initialized.

There are a number of issues with his points.

  • First of all, the name.  I initially assumed that "half bean" meant
    that the pattern is supposed to avoid those "half beans", those Java Beans
    that are not completely initialized.  Well, no.  Half in this case
    means that the getters are in the main class and the setters and no-op
    argument are in the Builder class.  And to validate his point, Brian makes the following parallel:

    The getters go into one class and the no-arg constructor and setters go in
    the other. If it looks familar, you’re right – it’s basically the same pattern
    as String/StringBuffer.

    String and StringBuffer certainly have no such connection.  They
    are two very separate classes and one just happens to use the other for its
    internal implementation, but that’s just a detail.  They could happily live
    separately and would serve a purpose on their own, as opposed to the Album/AlbumBuilder
    duo.
     

  • Second, all the AlbumBuilder class does is postpone the problem. 
    Instead of making sure that Album is fully initialized, you must now put this
    logic in AlbumBuilder.
     
  • And finally, this technique introduces a lethal weakness in the application
    by coupling the classes Album and AlbumBuilder in ways that are not only
    invisible to the programmer but invisible to the compiler as well.  For
    example, if one day, the design mandates the addition of a field "Producer" to
    the Album class, I have to remember to update AlbumBuilder or everything will
    break.

Overall, this technique doesn’t help you address the main problem which, in
my opinion, is not where you should signal the error but rather how
you should handle it.  It is much more important to decide whether such an
error should be an AssertionError, or an IllegalArgumentException or something else,
and more importantly, whether it should be recoverable or not.

On a related note, the initial motivation that made Brian come up with this
design was the absence of named parameters in Java:

Some languages solve this problem with keyword arguments, but Java doesn’t
have them, so we need another solution.

It is actually quite easy to emulate named parameters in Java.  For
example:

new Album().title("The Wall").band("Pink Floyd");

This is very much un-Java, but it’s there if you decide you need it one day.

Forthcoming talk on Aspects

I will be making a presentation about "Aspect-Oriented
Programming in Java and J2EE
" at the next
SF Java Users Group
on Wednesday, October 15th.

Also, my friend and colleague Chris Fry will
regale you about StAX, the
streaming XML parser, which has
recently been reviewed
at xml.com
.

Stop by and say hi!

Schwartz and numbers

From
Jonathan Schwartz’ interview
:

Schwartz: I think it can definitely change our dialogue with our
customers. If you look at our top 65 accounts, there’s 10 million people there.
At $100 each that’s a billion dollars. So I think it certainly gives us a
broader market opportunity

Mmmh… That’s about 150,000 employees per company in average.

Someone needs to brush up on their arithmetic.

but I’m not a good prognosticator about our revenue streams.

You don’t say.

Checked exceptions and virtuality

I was reading the ongoing interviews of
Anders Hejlsberg and
James Gosling over the
weekend and I had several thoughts.

If you are a regular reader of this weblog, you know that I have a high respect
for Anders Hejlsberg and his work (current and past).  Overall, his stance
on various issues is very pragmatic and fairly well articulated.  However,
I find myself disagreeing on two of the issues discussed in the latest parts of
these interviews, namely:

  • Why C# methods are not virtual by default.
     
  • Why C# doesn’t support checked exceptions.

What strikes me in Anders’ interviews is that while he gives numerous
technical reasons for these choices, he omits to mention what I think
is the principal motivation:  the necessity for cross-language
compatibility.

C# runs on the CLR and has therefore to obey constraints that are sometimes
non negotiable.  The CLR was built to be cross-language, and as such, it
also has to support C++ and Visual Basic, none of which support checked exceptions
and I don’t really understand why Anders never even mentioned that
these requirements weighed heavily on the design decisions involving these two
choices.

As for the age-old debate "checked exceptions versus runtime
exceptions", I refer you to the
current TheServerSide thread which contains a
lot of interesting articles (especially Mike Spille’s).

As for the question "should methods be virtual by default?",
it’s close to a religious issue but I’ll share a few thoughts.

The way I see it, code can be extensible in two ways:  "by
design" and "technically".

Being "technically extensible" means that the language and tools
you are using give you the power to extend the code without any workaround. 
Languages that are on the "virtual by default" side tend to produce code that is
more technically extensible than others.

If the code is extensible "by design", the extension points and
their contracts have been thoroughly tested and documented.

It’s very rare to find code that is extensible both technically
and by design, but in my experience, at least if it is technically extensible, I can
find ways to work around the absence of design.  If you mark your method
private or final, I am left with no options at all.

As for James Gosling’s interview, I highly recommend it, it’s
filled with very sensible advice about why checked exceptions are a good thing. 
My favorite part is:

It really is a culture thing. When you go through college and
you’re doing assignments, they just ask you to code up the one true path. I
certainly never experienced a college course where error handling was at all
discussed.

James talks about the "creepy feeling" you should have when you
write code that downplays the importance of errors and argues that with checked
exceptions, you can’t escape that feeling:  you have to face it.

And I think it’s a good thing.

The ultimate View technology

As you probably know, there are a lot of "template view" technologies out
there.  The most popular, and also a J2EE standard, is JSP.  Another
one is Velocity.  One could also name XML+XSL.

All these technologies have pros and cons, but they all have in common that
they are mixing two different languages in a single source.  The amount of
interlace varies, based on the technology you use and it’s sometimes hard to
draw the line.  Is a JSP page an HTML or a Java source?

These technologies also have different ways to solve the "escape in / escape
out" problem.  In JSP, you use special markers to insulate the Java code
from HTML.  In Velocity, you use # and $ to refer to Velocity code. 
None of these methods are real showstoppers but they can end up producing some fairly
ugly templates.

Is there no really clean way to solve this problem?

Well, there is.

Ruby.

In a previous entry, I used a mix of two features in Ruby
(closures and dynamicity, and more particularly, method_missing) to solve the
following three problems in one fell swoop:

  • Make sure each open tag gets closed automatically.
  • Automatically indent.
  • Integrate cleanly with the Ruby syntax.

Let me give a quick example from the LogAnalyzer utility I have been
discussing these past days.  This piece of code generates the HTML to
display a list of referrers:


def createMiddleReport
  xml = XML.new
  @referrers.referrers.each { |date, refs|
    xml.p {
      xml.b("Referrers for #{date}")
    }
    xml.table {
      refs.each { |ref, ref2|
        xml.tr {
          xml.td {
            xml.a({ "href"
=> ref }) {
             
xml.append(ref)
            }
          }
        }
      }
    }
  }

 
xml.to_xml
end

Notice how this code happily mixes templating and logic.  I iterate
whenever I have to and I insert the content into the HTML string when the time
is right.  Other technologies will subtly impose their programming model on
your code, for example by making you compute the generated code and store it in
a HashMap, or assign it to a variable which you can then use.

With this method, there is no need to escape in and out of HTML:  it is
automatically covered by the method invocations on the XML instance.

Here is another example:


xml = XML.new
xml.html {
  xml.head {
    xml.title("Statistics for
http://beust.com/weblog
")
  }
  xml.body {
    xml.table{
      xml.tr {
        xml.td({ "valign"
=> "top" }) {
         
xml.append(v.createLeftSideReport)
        }
        xml.td({ "valign"
=> "top" }) {
         
xml.append(v.createMiddleReport)
        }
      }
    }
  }
}

File.open(OUTPUT_MAIN, "w") { |f|
  f << xml.to_xml
}

If you are curious, you can take a look at the
final result, and
also download xml.rb.

Alright, let’s be a little bit more
serious now.

Of course, saying that Ruby is the ultimate template view technology should be
taken with a grain of salt.  Obviously, this is not an option if you are a
Java programmer.  You should also note that Ruby happens to have two very
handy functionalities that make this trick possible, the method_missing hack is
actually more due to Ruby being dynamically typed than anything else. 
Also, if your language of choice doesn’t support closures, you will be reduced
to something like XMLStringBuffer, which I described in a
previous entry
It is not as pretty as what you just saw, but it fits the bill pretty well.

 

Log analyzer in Ruby

Here is the problem I am trying to solve:  all the statistics for my Web
site are stored by my ISP in a directory, one per day.  Each file is
compressed and called, for example,
www.20030915.gz
.

I want to write a Log analyzer that will make it easy for me to collect
various statistics and still be extensible so that I can add more monitoring
objects as time goes by.  Right now, here are some examples of the numbers
I’d like to see:

  • Number of hits on my site.
  • For my weblog, number of HTML and RSS hits.
  • The list of referrers for, say, the past three days.
  • The number of EJBGen downloads
    each day.
  • The keywords typically used on search engines to reach my site.

Of course, it should be as easy to obtain totals per month or even per year
if needed.

The idea is the following:  when the script is run, it should run
through all the compressed files and build an object representation of each file
and line.  Then it will invoke each listener with two pieces of
information, Date and LogLine.  Each listener is then free to compute its
statistics and store them for the next phase.

Once the data gathering is complete (back-end), it’s time to present the
information.  There are several possibilities to achieve that goal but for
now, I’ll just make sure that back-end and front-end are decoupled.  I
envision one class, View, to be passed all the gathered information and generate
the appropriate HTML.

So first of all, we have the class LogDir, which encapsulates the directory
where my log files are stored.  Using the convenient "backtick" operator,
it is fairly easy to invoke gzip on each file and store each file in a LogFile
object, which in turn contains a list of LogLines.

When it’s done, LogDir then calls all the listeners with the following method:


def processLogFiles
 
@files.each { |fileName|
    sf = LogFile.new(fileName)
    sf.logLines.each { |l|
      @lineListeners.each { |listener|
        listener.processLine(fileNameToDate(fileName), l)
      }
    }
  }
end # processLogFiles

The main loop is fairly simple:


ld = LogDir.new(LOG_DIR)
ld.addLineListener(ejbgenListener = EJBGenListener.new)
ld.addLineListener(weblogListener = WeblogListener.new)
ld.addLineListener(referrerListener = ReferrerListener.new)
ld.addLineListener(searchEngineListener = SearchEngineListener.new)
ld.processLogFiles

The last line is what causes LogDir to start and invoke all the listeners.

For example, here is the EJBGenListener.  All it needs to do is see if
the HTTP request includes "ejbgen-dist.zip" and increment a counter if it does. 
The overall result is a Hashmap of counts indexed by a Date object:


class EJBGenListener
 
def initialize
   
@ejbgenCounts = Hash.new(0)
 
end

 
def processLine(date, line)
   
if line.command =~ /ejbgen-dist.zip/
     
key = date.to_s
     
n = @ejbgenCounts[key]
     
n = n + 1
     
@ejbgenCounts[key] = n
   
end
 
end

 
def stats
   
@ejbgenCounts
 
end
end # EJBGenListener

The only thing worth noticing is that
the Hash constructor can take a parameter which represents the default value of
each bucket (0 in this case).

Ruby’s terseness is a real pleasure to work
with.  For example, I need to run some listeners on the three most recent
files of the directory (which obviously change every day).  Here is the
relevant Ruby code:


Dir.new(dir).entries.sort.reverse.delete_if { |x| ! (x =~ /gz$/) }[0..2].each {
|f|
  // do something with f
}

Compare this with the number of lines needed in Java…

So far, the code is mundane and very straightforward, not very different from
how you would program it in Java.  In the next entry, I will tackle the
front-end (HTML generation) because this is really the point I am trying to make
with this series of articles.

Open Source and Documentation

Ted and a few other
people (see the comments) are complaining about the quality of Open Source
documentation.  They are not alone.

Here is a typical example.

On a regular basis, I see an announcement for a new utility show up on
JavaBlogs or some other news source.  I immediately click on it and very
often, the link is merely taking me to the home page of the project on
sourceforge.  That’s already a bit frustrating, but okay, fine.  My
reflex then is not to click on the "files" link, nor "lists", nor to
check out the CVS repository.

I click on "Documentation".

And 99% of the time, that page is empty.

At this point, I just close the tab and move on, and you have just lost a
potential user.

If you are going to post an announcement for your project, you need to take
some time off coding and write up a document.  It doesn’t need to be
extensive, it doesn’t need to be perfect, but just like Jason, Ted and others, I
don’t have the time to read your source code.  I will be very happy to have
it handy if I need to debug something in your code one day, but until that day
happens, your documentation is all I need.

Explain what problem you are trying to solve, how you solve it and how to use
your software.

But there is more to writing documentation.  To me, a developer who
spends some time trying to communicate her work other than through code shows
that she has some perspective.  She is not just "all code".  She
understands users are a different breed and that you need to interact with them
if you are really trying to solve their problem, as opposed to just "scratching
a technical itch" because it’s fun and then pretending you have a product.

Admittedly, documentation written by developers is rarely good, and after a
certain point, you do need technical writers.  But for the SourceForge kind
of project, it’s more than enough and it shows the world that you are not just a
hacker:  you are a developer, and you remember who you are working for.

Generating XML in Ruby

I have been running my weblog on Movable Type for about a month now and I have
to say I am really impressed.  For a collection of scripts put together,
Movable Type is an impressive piece of software, both powerful and
intuitive.  I expected it to be a challenging installation, especially
since I am not running my blog on my home machine but at an ISP, but it turned
out to be remarkably painless.

Having said that, I have one big complaint:  no support for referrer logs. 
I couldn’t find any way to have quickly access to my referrer log anywhere in
the Movable Type distribution.  A quick Google query turned up several packages implemented in various
languages.  I tried a lot of them but I could never quite reach the result
I was looking for, so I decided to write my own.

My ISP conveniently stores the logs for my Web site every night in a
well-defined directory, following a standard naming notation for each day. 
I decided it would be easier to calculate my log referrer from these logs
instead of embedding scripting information in my main index file, since the
updates don’t really need to be more frequent than once a day.

Finally, I had to choose a language.  Since I opted for the static approach, I
am not limited to the languages that my ISP supports for CGI programming (PHP
and Perl).  The obvious choice was Ruby, which excels at this kind of
treatment with its native support for regular expressions, invocation of
external commands and offers an object-oriented language from the ground up
giving me extreme flexibility in my attempt to write a utility that will be easy
to extend for my future log parsing needs.

Since I was going to have to
generate HTML, I thought I would port a small Java class that I have been using
to generate XML in EJBGen called XMLStringBuffer.  The idea is simply to
not have to worry about indentation and closing the tags.  With this class,
generating XML is as simple as:


XMLStringBuffer xsb = new XMLStringBuffer();
xsb.push("person");
xsb.addRequired("last-name", m_lastName);
xsb.addOptional("first-name", m_firstName);
xsb.pop("person");

Note that I don’t really need to specify the closing tag in the pop() call, but
it makes debugging easier since the XMLStringBuffer maintains an internal stack
of the tags and can therefore tell me right away if my push/pop get out of
synch.

It quickly occurred to me that I could make this class even fancier in Ruby
thanks to two features that are sadly absent from Java:  closures and method_missing (really dynamic typing).

The idea is to use closures to simulate indentation, and method_missing to make
the XML class allow invocations on any method.  If the said method
is unknown, it is simply turned into an XML tag.

Here is a piece of code that will make it all clearer:


xml = XML.new

xml.html {
  xml.head {
  }
  xml.body {
    xml.table {
      xml.tr {
        xml.td({ "valign"
=> "top"}, "Content1"){
        }
        xml.td {
          xml.append("Content2")
        }
      }
    }
  }
}

As you can see, each new
closure (pairs of { }) starts a new tag and will cause an indentation and the
proper tag to be closed when the block is exited.  Note also that every tag
can be passed a Hash that will be turned into attributes if found.  You can
also specify the content of the tag either inline or later in the closure with
the append() method.  The generated XML is as follows:


<html>
  <head>
  </head>
  <body>
    <table>
      <tr>
        <td valign="top">Content1</td>
        <td>
          Content2
        </td>
      </tr>
    </table>
  </body>
</html>

The XML class is about forty lines, including comments.

In a next entry, I will give more details about the logging utility itself.

More on components and classes

Anthony offered some
interesting
thoughts
on classes and components, but I believe he is not pushing the idea
far enough, which leads to some very unpractical considerations, such as:

Thus methods should not return values because it is the events which are used
to determine the results

This is clearly at odds with the way we program today, and probably will be for many years to come.

The solution out of this dilemma is to think of classes and components as
orthogonal features.  You don’t need to compromise one to get the other. 
They complement each other very nicely.  I believe Anthony’s misdirected
view comes from the fact that he makes a one-to-one relationship between a class
and a component (and even a one-to-one relationship between a method and an
event).  Things do not need to be coupled that finely, although it can
occasionally happen.

Just like a component can span over several classes, an event doesn’t
necessarily map to one single method.

The way I see it, which is very reminiscent of the way COM developers have
been programming these past years, you can start by writing your application the
normal way, and then identify components and events as you go by.  Of
course, you can also identify these components and events in the design phase,
as long as you don’t make the mistake of tying them tightly to the way you are
going to design your classes.

Once you have your application, you can start sprinkling event firings
wherever you see fit.  You can also notice that a set of classes taken
together can form a component, hence mimicking the metaphor of the integrated
circuit.  Then you can choose to formalize this component using your
favorite component model (whether it already exists or will be invented one of
these days).

When you look at your application from "above", it doesn’t matter if a
component uses one or several classes or if methods return values.  These
are implementation details that are inside your integrated circuit.  You
need to understand them if you want to modify the behavior of your component,
but they are of no use to you if all you need is the component itself, with its
attributes and its events.

This vision is relatively simple and it’s too bad that the only component
model that might come close to allowing us to achieve it is JavaBeans, because
this specification is very primitive and encourages bad component programming
practices, such as implying an isomorphism between classes and components.

I have very high hopes that a better component model leveraging JSR 175 will
emerge and will allow us to achieve a similar degree of component reuse as the
one we see in COM today.

From classes to components

After Holub’s nonsensical article about getters and setters, it’s quite a
relief to read Anders
Hejlsberg interview
, especially since the C# architect is basically saying
the exact opposite of what Holub tried to say.

Anders emphasizes the fact that nowadays, developers need to think less in
terms of classes and more in terms of components (another debate that has
recently flared up in the Java community).

The way I see it, Components are a superset of Classes.  What exactly
differentiates components from classes is open for interpretation, but I like
Anders’ simplification:  while classes are about properties and methods
(PM), components are defined by PME:  properties, methods and events.

This observation makes both Properties and Events prime citizens of the
Component programming model, which explains why C# supports both of them
natively, while Java achieves this through interfaces (another language that has
native support for accessors but not events is Ruby).

I, for one, really wish that Java had native support at least for accessors,
so that we can finally drop the confusing "A read-write property is defined if
the Java class has two methods, getFoo() and setFoo().  In this case, the
name of the property is that of the method where you remove "get" and lowercase
the first letter of the remaining name".  Yikes.

Another topic that Anders discusses in this interview is delegates.

Ever since Microsoft’s initial attempts to add delegates to its own Java
Virtual Machine, Java developers have had a very strong bias against this
concept.  While creating an incompatible JVM is indeed something that
should be fiercely condemned, it’s a shame that the concept of delegates was
thrown away with it, because it makes a lot of sense in a language such as Java.

Anders gives several reasons why delegates are a good idea, but to me, the
one that’s most important is that delegates allow you to keep the amount of
classes and interfaces to a reasonable level.

Every Java developer who has written Swing applications (or any GUI, for that
matter) knows how Action objects quickly proliferate, making the whole
architecture hard to follow, not mentioning the number of objects that get
created just so that a callback method can be invoked.

Delegates allow you to tie an Action to one single method.  No new
interface or new class is needed.

Delegates go one step further:  they don’t require type conformance but
only signature  conformance.  This design choice reopens the age-old
debate about static versus dynamic binding, and more particularly, begs the
following question:  if two methods have the exact same signature but
belong to two different classes, are they semantically equivalent?

My experience is that in practice, it’s something that doesn’t really cause
problems (I usually make a similar observation about untyped Collections and the
fact that in practice, the downcast is rarely a source of ClassCastExceptions).

Anders makes some other excellent points, such as the performance gain of a
delegate versus a direct method invocation, or his interesting take on what he
calls "simplexity".  Read the interview for more details.