Archive for category Ruby

Ruby on Rails and validation

Ruby on Rails is radically changing the way developers look at Web
applications, and it’s great to see all the excitement that surrounds the many
innovative features it offers.  One of these features is in-model
validation, which is understandably seen as an improvement over validation tied
to the presentation framework.


this entry
, Todd Huss shows a convincing example of this approach:

class Entry < ActiveRecord::Base

  # Relationships
  belongs_to :bliki

  # Validation
  validates_length_of :name, :within => 6..100
  validates_uniqueness_of :name
  validates_format_of :name, :with => /^\w+$/,
  :message => "cannot contain whitespace"
  validates_length_of :content, :minimum => 10
  validates_associated :bliki

This kind of code is fairly usual to any regular Ruby on Rails developer to
the point where it’s actually easy to overlook that there is something very
wrong with it:  there is no easy way to capture this validation logic for
reuse elsewhere.

I could move this validation to a common class but it’s not good enough, for
two reasons:

I still might want to fine tune a few parameters here and there (sometimes,
the length of the name can be 6..100, other times 5..10 but the other
constraints need to stay in place).

Not all my ActiveRecord objects extend the same class.

Has anyone tried to use a mix-in to achieve this?   Any other idea?

Flaws in Ruby

I am in general very fond of Ruby. It’s a very appealing language allowing all kinds of object-oriented designs not available in Java and other traditional languages. However, there is no such a thing as a perfect language, and there are a few details in Ruby that bother me. For example:

  • The Perl heritage. Ruby uses variables named $`, $’, etc… This hacker parlor trick is the worst outrage you can inflict to a program. You are guaranteed to confuse anyone who isn’t intimately familiar with Perl if you use these variables. Luckily, a module called “english.rb” allows you to use more meaningful names, but not everyone uses it.
  • The end keyword. I am a big fan of meaningful indenting, such as in Python. It bothers me when I read a source and suddenly, I see five “end” words next to each other in decreasing indentation. This is visually unpleasant and clutters the code. And if visually indenting is not an option, at least “}” is not as verbose as “end”, which I just can’t help spelling in my mind when I read it, even though it doesn’t add much to the semantics of the code.
  • No overloading. That’s right. If you want overloaded methods, you need to declare one method with a varargs signature and decide what code to invoke based on the types of the objects that were passed. This omission boggles my mind, but it’s perfectly in line with Matz’ philosophy, who is a strong opponent of orthogonal features because they tend to “explode into complexity”.

Matz does have a point with the exploding complexity of orthogonal features. I believe this fact is one of the main reasons why C++ became so unbelievably complex, both in syntax and in semantics. For examples, templates were initially introduced using the “<" and ">” characters. It didn’t take long before somebody realized that this new notation would conflict with the stream operator “>>”, thereby forcing you to close nested templates with “> >” instead of “>>”.

However, I believe that in the particular case of overloading, Matz is mistaken. This is one of the few features whose combination with other features is pretty well understood and still easy to read. The only problem I can think of is when you try to mix overloaded methods with default arguments. The ambiguity of this particular case led to the rule that default arguments can only be specified at the end of the signature (okay, there is another reason for this constraint that has to do with the way parameters are pushed to the stack, but I won’t go there).

Matz himself is the first to say that “there is no perfect language”. Or rather, his perfect language is not my perfect language. Fair enough.

Ruby is still a joy to program.

Log analyzer in Ruby

Here is the problem I am trying to solve:  all the statistics for my Web
site are stored by my ISP in a directory, one per day.  Each file is
compressed and called, for example,

I want to write a Log analyzer that will make it easy for me to collect
various statistics and still be extensible so that I can add more monitoring
objects as time goes by.  Right now, here are some examples of the numbers
I’d like to see:

  • Number of hits on my site.
  • For my weblog, number of HTML and RSS hits.
  • The list of referrers for, say, the past three days.
  • The number of EJBGen downloads
    each day.
  • The keywords typically used on search engines to reach my site.

Of course, it should be as easy to obtain totals per month or even per year
if needed.

The idea is the following:  when the script is run, it should run
through all the compressed files and build an object representation of each file
and line.  Then it will invoke each listener with two pieces of
information, Date and LogLine.  Each listener is then free to compute its
statistics and store them for the next phase.

Once the data gathering is complete (back-end), it’s time to present the
information.  There are several possibilities to achieve that goal but for
now, I’ll just make sure that back-end and front-end are decoupled.  I
envision one class, View, to be passed all the gathered information and generate
the appropriate HTML.

So first of all, we have the class LogDir, which encapsulates the directory
where my log files are stored.  Using the convenient "backtick" operator,
it is fairly easy to invoke gzip on each file and store each file in a LogFile
object, which in turn contains a list of LogLines.

When it’s done, LogDir then calls all the listeners with the following method:

def processLogFiles
@files.each { |fileName|
    sf =
    sf.logLines.each { |l|
      @lineListeners.each { |listener|
        listener.processLine(fileNameToDate(fileName), l)
end # processLogFiles

The main loop is fairly simple:

ld =
ld.addLineListener(ejbgenListener =
ld.addLineListener(weblogListener =
ld.addLineListener(referrerListener =
ld.addLineListener(searchEngineListener =

The last line is what causes LogDir to start and invoke all the listeners.

For example, here is the EJBGenListener.  All it needs to do is see if
the HTTP request includes "" and increment a counter if it does. 
The overall result is a Hashmap of counts indexed by a Date object:

class EJBGenListener
def initialize
@ejbgenCounts =

def processLine(date, line)
if line.command =~ /
key = date.to_s
n = @ejbgenCounts[key]
n = n + 1
@ejbgenCounts[key] = n

def stats
end # EJBGenListener

The only thing worth noticing is that
the Hash constructor can take a parameter which represents the default value of
each bucket (0 in this case).

Ruby’s terseness is a real pleasure to work
with.  For example, I need to run some listeners on the three most recent
files of the directory (which obviously change every day).  Here is the
relevant Ruby code: { |x| ! (x =~ /gz$/) }[0..2].each {
  // do something with f

Compare this with the number of lines needed in Java…

So far, the code is mundane and very straightforward, not very different from
how you would program it in Java.  In the next entry, I will tackle the
front-end (HTML generation) because this is really the point I am trying to make
with this series of articles.

Generating XML in Ruby

I have been running my weblog on Movable Type for about a month now and I have
to say I am really impressed.  For a collection of scripts put together,
Movable Type is an impressive piece of software, both powerful and
intuitive.  I expected it to be a challenging installation, especially
since I am not running my blog on my home machine but at an ISP, but it turned
out to be remarkably painless.

Having said that, I have one big complaint:  no support for referrer logs. 
I couldn’t find any way to have quickly access to my referrer log anywhere in
the Movable Type distribution.  A quick Google query turned up several packages implemented in various
languages.  I tried a lot of them but I could never quite reach the result
I was looking for, so I decided to write my own.

My ISP conveniently stores the logs for my Web site every night in a
well-defined directory, following a standard naming notation for each day. 
I decided it would be easier to calculate my log referrer from these logs
instead of embedding scripting information in my main index file, since the
updates don’t really need to be more frequent than once a day.

Finally, I had to choose a language.  Since I opted for the static approach, I
am not limited to the languages that my ISP supports for CGI programming (PHP
and Perl).  The obvious choice was Ruby, which excels at this kind of
treatment with its native support for regular expressions, invocation of
external commands and offers an object-oriented language from the ground up
giving me extreme flexibility in my attempt to write a utility that will be easy
to extend for my future log parsing needs.

Since I was going to have to
generate HTML, I thought I would port a small Java class that I have been using
to generate XML in EJBGen called XMLStringBuffer.  The idea is simply to
not have to worry about indentation and closing the tags.  With this class,
generating XML is as simple as:

XMLStringBuffer xsb = new XMLStringBuffer();
xsb.addRequired("last-name", m_lastName);
xsb.addOptional("first-name", m_firstName);

Note that I don’t really need to specify the closing tag in the pop() call, but
it makes debugging easier since the XMLStringBuffer maintains an internal stack
of the tags and can therefore tell me right away if my push/pop get out of

It quickly occurred to me that I could make this class even fancier in Ruby
thanks to two features that are sadly absent from Java:  closures and method_missing (really dynamic typing).

The idea is to use closures to simulate indentation, and method_missing to make
the XML class allow invocations on any method.  If the said method
is unknown, it is simply turned into an XML tag.

Here is a piece of code that will make it all clearer:

xml =

xml.html {
  xml.head {
  xml.body {
    xml.table { {{ "valign"
=> "top"}, "Content1"){
        } {

As you can see, each new
closure (pairs of { }) starts a new tag and will cause an indentation and the
proper tag to be closed when the block is exited.  Note also that every tag
can be passed a Hash that will be turned into attributes if found.  You can
also specify the content of the tag either inline or later in the closure with
the append() method.  The generated XML is as follows:

        <td valign="top">Content1</td>

The XML class is about forty lines, including comments.

In a next entry, I will give more details about the logging utility itself.