zippedFiles = Dir.new(dir).entries.sort.reverse.delete_if { |x| ! (x =~ /gz$/) }
’nuff said.
Okay, there is more to say.
First of all, what does this line of code do? It goes through every file in the given directory, sort them in reverse order while excluding any file that doesn’t end in “.gz”.
This code ported to Java is quite intimidating:
List<String> result = new ArrayList<String>(); File f = new File(directory); for (String fileName : f.list()) { if (fileName.endsWith(".gz")) { result.add(fileName); } } Collections.sort(result, new Comparator<String>() { public int compare(String o1, String o2) { return o2.compareTo(o1); } public boolean equals(Object o) { return super.equals(o); } });This code comes from a log analyzer utility that I wrote some time ago. It goes through the Apache log of my Web server and allows me to easily plug-in listeners to collect various kinds of statistics. This utility has provided me with a flexible log analyzer framework into which I have plugged various additional loggers these past months.
Since I hadn’t taken a look at this code in a few months, I was quite happy to realize that it passes the “six-month readability test”. A language that has never passed this test for me is Perl. Perl might be a powerful language but if you stop using it for six months, you will need a book to reread your own code and a personal trainer just to modify it.
So I was quite happy to understand my old Ruby code right away, even in the most idiomatic sections such as the one I pasted above. The code carries its intent clearly thanks to aptly-name methods and closures are, as usual, as pleasant to use as they are powerful.
There is a problem with my log analyzer, though, which is the reason why I am revisiting it today: it’s pretty slow. It takes about five minutes to run through a month of logs, which I find unacceptable. Therefore, I want to port it to a different language.
While I love Ruby, I have to say I like Groovy even more, because it gives me the same flexibility as Ruby with the familiar Java syntax on top of it. However, I have had some bad experiences with the current versions of Groovy and as far as I can tell from the mailing-list, the stability of the compiler still leaves a lot to be desired.
Exit Groovy (for now). So it will probably be Java or C#. I am hoping the poor performance comes from the Ruby interpreter and not from my code, but I will find out soon enough.
#1 by Eoin on September 16, 2004 - 2:49 pm
String already implements Comparable? Collections.sort(result) should be sufficient?
#2 by Cedric on September 16, 2004 - 2:55 pm
The problem is I need to sort them in reverse order…
The STL allowed to address this kind of problem pretty well but I can’t think of a way to do this in Java without supplying my own Comparator.
Anyone?
#3 by Doug L. on September 16, 2004 - 3:06 pm
FWIW, zsh: ls -rd ^*.gz
Not that this helps you very much …
#4 by Scott on September 16, 2004 - 3:07 pm
Collections.sort(result, Collections.reverseOrder());
#5 by Sam Pullara on September 16, 2004 - 3:23 pm
Here is a groovier version:
gzippedFiles = new java.io.File(args[0]).listFiles().toList().sort().reverse().findAll { it =~ “gz$” }
#6 by Dmitri Colebatch on September 16, 2004 - 5:49 pm
What is it about stuffing as much code into one line as possible that seems attractive? I’ve only dabbled in Ruby, but agree that its damn nice (really should have another look at what support is around in IDEs, I dont like typing).
You’re also forgetting java’s FilenameFilter:
List files = Arrays.asList(new File(directory).listFiles(new FilenameFilter()
{
public boolean accept(File dir, String name)
{
return name.endsWith(“.gz”);
}
}));
Collections.sort(files, Collections.reverseOrder());
#7 by Dmitri Colebatch on September 16, 2004 - 6:05 pm
I meant to add – that the only nice thing you’re demonstrating there are closures, which yes – are damn nice (o:
#8 by Jonathan O'Connor on September 17, 2004 - 2:43 am
If you only run this stuff once a month, and it takes 5 minutes, then that’s an hour for a whole year. How much time are you going to waste optimizing this? More than an hour, I guess 🙂
#9 by Peter Reilly on September 17, 2004 - 4:04 am
Pretty neat ruby and groovy examples.
I would like to include them in the
ant script manual page.
#10 by Cedric on September 17, 2004 - 5:25 am
Ah, good point, Jonathan.
Actually, I run this script every day and it gives me statistics on the past month.
Still not that much of a big deal, but as you know, developers never need a sound reason to rewrite something from scratch 🙂
#11 by Cedric on September 17, 2004 - 5:25 am
Peter, you are very welcome to include these examples in the manual. Let me know where to find it, by the way…
#12 by Peter Reilly on September 17, 2004 - 5:56 am
Ta,
the manual page in in cvs:
http://cvs.apache.org/viewcvs.cgi/ant/docs/manual/OptionalTasks/script.html
#13 by Scott Ganyo on September 17, 2004 - 6:07 am
FWIW, it would be slightly more efficient to sort after select, like this (groovy syntax):
gzippedFiles = new java.io.File(args[0]).listFiles().toList().findAll { it =~ “pdf” }.sort().reverse()
Scott
#14 by Scott Ganyo on September 17, 2004 - 6:09 am
Bah. Ignore that “pdf”. It should’ve been:
gzippedFiles = new java.io.File(args[0]).listFiles().toList().findAll { it =~ “gz$” }.sort().reverse()
Sigh
#15 by Chris Thiessen on September 17, 2004 - 7:26 am
You might want to check out the Kataba libraries for Java (www.kataba.com). They provide closures, collections for all types, and a simplified I/O model, but they focus on reducing verbosity. They were actually inspired partly by Ruby and Python. Anyway, here’s the code:
List_o zippedFiles
=Colls.list(Files.nameMatches(Files.listDir(dir),”gz$”)).sort().reverse();
You’d need a couple of imports:
import com.kataba.io.*;
import com.kataba.coll.*;
I actually used it to write a log analyzer, so I know it works well for that. 🙂
-Chris (author of Kataba Dynamics)
#16 by Luke Hutteman on September 17, 2004 - 9:06 am
Even simpler (and more performant) than ” Collections.sort(files, Collections.reverseOrder());” :
Collections.reverse(result);
#17 by Jonas Galvez on September 17, 2004 - 10:36 am
The Python version is pretty much a one-liner too:
files = filter(lambda s: re.match(“gz$”, s), sorted(os.listdir(dir))[::-1])
#18 by Jonas Galvez on September 17, 2004 - 10:40 am
Err, that should re.search.
#19 by Joel VanderWerf on September 17, 2004 - 12:16 pm
A shorter way in ruby:
zipped = Dir.entries(dir).sort.reverse.grep(/gz$/)
#20 by Martin DeMello on September 17, 2004 - 12:21 pm
Shorter Ruby way:
Dir.new(dir).entries.grep(/gz$/).sort.reverse
#21 by Daniel Berger on September 17, 2004 - 12:32 pm
Is the sort really necessary? Aren’t they already going to be returned in sorted order by default, ala ‘ls’?
Also, as Joel indicates, Dir.new(dir).entries can be reduced to Dir.entries(dir), unless you really want a Dir object. You don’t in your example.
#22 by Bill Guindon on September 17, 2004 - 2:07 pm
Why not just:
Dir[dir + ‘*.gz’].sort.reverse
#23 by Jonas Galvez on September 17, 2004 - 3:03 pm
Here’s another Python version:
files = glob.glob(“*.gz”)
files.sort()
files.reverse()
In Python 2.4, it could be:
files = sorted(reversed(glob.glob(“*.gz”)))
Pretty damn simple, I say 🙂
#24 by botp on September 17, 2004 - 7:11 pm
Dir.glob(“*.gz”).sort.reverse
#25 by botp isbotp on September 17, 2004 - 7:17 pm
Dir.glob(“*.rb”).sort.reverse
#if it looks and acts like ruby, then it is ruby
#26 by Thien on September 17, 2004 - 9:08 pm
Dir[‘*.gz’].sort.reverse
#27 by Robert on September 19, 2004 - 4:48 am
Another solution that saves 1 intermediate array:
dir = … # the dir you want to explore
Dir[File.join( dir, “*.gz” )].sort {|a,b| ba}
#28 by Moritz Petersen on September 20, 2004 - 4:52 am
Here’s my Java version:
Collections.reverse(Arrays.asList(new File(dir).listFiles(new FileFilter(){public boolean accept(File f){return f.getName().matches(“.*\\.gz$”);}})));
😉
#29 by Sam Pullara on September 22, 2004 - 11:00 am
Here is a slightly more efficient java version:
Arrays.sort(new File(dir).list(new FilenameFilter() {public boolean accept(File f, String n) {return n.endsWith(“.gz”);}}), Collections.reverseOrder());
#30 by Sam Pullara on September 22, 2004 - 11:31 am
Just for fun, here are some benchmarks on a 659 file directory:
Sam’s Java Version: 2.7 ms per list
Sam’s Groovy Version: 15.8 ms per list
Moritz Petersen’s Java Version: 9.6 ms per list
Cedric’s Java Version: 3.1 ms per list
I’d love to test the other ones on my machine (powermac g4 1.4/1.4) if you send me an easy to run benchmark that measures them.
#31 by Moritz Petersen on October 18, 2004 - 11:57 pm
Damn. I lost 😉
#32 by `kill -3 thoughtProcess` on October 21, 2004 - 8:23 am
Agile language dilemma
I have always felt that a successful developer needs to be proficient in atleast one scripting language. It is essential since it
allows one to prototype and test rapidly.
helps you get out of tricky situations where you really dont want to start
#33 by Jacklyn on October 2, 2010 - 7:16 am
Couldn’t agree more, “kill -3”. If a developer masters 1 scripting language, have the battle is won.
#34 by Jacklyn on October 2, 2010 - 7:18 am
Sorry, forgot to add: but it’s the same with mastering different languages: add 1 additonal language – scripting or spoken – and you leave another 25% of the competition behind you.
#35 by Anuj on March 6, 2012 - 8:38 am
(New to the blog, so exploring).
With LINQ support in C# you could get this kind of syntactic sugar too:
var zippedFiles = new DirectoryInfo(“path”).GetFiles(“.gz”).OrderByDescending(f => f.Name); // assuming you’re ordering by name