Can you spot why the following program:
public class Split { public static void main(String[] argv) { String STRING = "foo bar"; String[] s = STRING.split(" "); for (int i = 0; i < s.length; i++) { System.out.println(i + " '" + s[i] + "'"); } } }
displays:
0 'foo' 1 '' 2 'bar'
The reason is that split() works a little differently from StringTokenizer:
it accepts a regular expression as a separator. In the code above, I
define this regular expression as " " (one space character) but the input string
contains two of them. Therefore, we can solve this problem by using the "
+" as a regular expression ("at least one space character").
Still, the fact that split() can return empty strings is deceiving,
especially if you are converting your code from StringTokenizer.
There
are a couple of good things about this behavior, though:
- You can reconstitute
the original string if you need to. - It makes it easier to parse strings with records that can
be empty, such as lines from a log file.
Can you think of any other use?
#1 by tjansen on August 14, 2004 - 12:18 pm
You can use it for other separators wuth which empty tokens may make sense. For instance, you can use split(“,”) to parse a CSV-formated line. For CSV data it’s important to get empty strings, as the row-number is usually needed.
#2 by Frank Nestel on August 16, 2004 - 5:38 am
The behaviour of StringTokenizer has made some of my collegues crazy, cause it was not well documented with Java (I don’t know if it is now). The point is that with separators like ” ” people would expect the StringTokenizer behaviour, with separators like “,” or “;” would expect the split() behaviour. But making it depend on the split charactor would make a really messy contract.
#3 by James A. Hillyerd on August 17, 2004 - 9:58 am
In this case the empty tokens make sense to me. What I don’t like is that:
“abc”.split(“”)
returns
{ “”, “a”, “b”, “c” }
I understand why it works that way from an implementation standpoint, but I prefer the way perl’s split works. Since my app lets users configure the split regex, I now have to check for the “” special case and handle it differently.