I spent that last few days revamping a Web site and I took this opportunity
to learn PHP, which has been an interesting experience.

This Web site contains about a thousand different HTML pages which I wanted
to store in a database in order to make it easier to browse.  My first task
was therefore to scrape this HTML in order to extract its meaningful content and
then to store into a database.

When I started this Web site six years ago, I had no idea I would ever need
to do something like this but I still followed the convention of surrounding the
information of importance with <span> tags.  This turned out to be of
critical importance.  I wrote a short Ruby script that did the parsing and
extracted the data into a canonical format that I later used as the central
repository from which to populate the database.

The next step was to set up Apache and MySQL to my liking, which turned out
to be a little more challenging than I had anticipated, because what I have
access to on my development machine is different from what my ISP lets me
modify.  But I’ll save that for a future entry if there’s interest and I’ll
focus on PHP for now.

Picking PHP was a no-brainer.  First because it is supported by my ISP
but also because I had always wanted to learn it and find out what all the buzz
was about.  I expected the experience to be painless and… 
surprisingly, it was.  Way beyond my expectations.

Here are a few
thoughts from the perspective of a Java programmer who has been heavily exposed
to J2EE for almost five years now.  Since these reflexions are based on a
PHP experience that is hardly just a few days old, it will most likely contain
inaccuracies that you should feel free to point out in the comments.

PHP is a very simple imperative language with an impressive amount of libraries. 
Even though it possesses a few object-oriented attributes, I chose to ignore
this aspect of the language in order to see what the code would look like if I
didn’t try to be too fancy, a habit that’s shockingly hard to shake off after so
many years of J2EE work.

PHP’s main strength is its very regular syntax and a few details that make it
extremely well suited for the Web, among which:

  • Strings can contain newlines, so you can embed big pieces of HTML into
    your code (not the most readable way to proceed, but awesome to reach a
    working prototype very fast).
  • String can be delimited with either double quotes or single quotes, and
    of course, the latter should be preferred since double quotes tend to come
    up quite often in well-formed HTML.

Not surprisingly, developing with PHP is very similar to JSP:  you end
up concatenating pieces of static HTML with dynamic PHP and this speeds up
prototyping quite a bit.  The problem is that once it works, you tend to
think twice before refactoring it because errors with missing or extra
delimiters are quite common, so in order to make it easy to debug, make sure you
set display_errors = true in your php.ini.

There are two PHP idiosyncrasies that Java programmers will most likely trip
upon:

  • Variables need to start with a dollar sign.
  • Globals are not available by default inside functions.

This first point was actually pretty easy to get used to, but globals still tricks me now and then.  For example:

$URL = "http://a.com";

function foo() {
  echo $URL;
}

will print an empty string.  Yup, not even an error (maybe this is
configurable in php.ini, I didn’t check).  The correct
code is:

$URL = "http://a.com";

function foo() {
  globals $URL;
  echo $URL;
}

This idiom will look familiar to those of you who used to program in TCL,
which had even more nebulous scoping rules.

Another thing I found out the hard way is that PHP doesn’t have any notion of
name space, so it took me quite a while to figure out why the following code
didn’t work:

function log($msg) {
   echo "[LOG] $msg";
}

The reason is that this function collides with the log function from the
standard library and that not only does PHP decide to favor the other one, it
also won’t let you know of such a collision.  This was a clear message to
me that I should invent my own namespace, and I therefore decided to prefix all
my methods with "cb" (I’m still unclear on which style is the best: 
cbConnectToDataBase() or cb_connectToDataBase()).

In the next installment, I will discuss the PHP MySQL API and how fighting
ten years of good software and OO practices are hard to shake off, even though
they’re not exactly easy to achieve with PHP.