Hidden evils of Java's String.split() and replace()
If you code in Java, you have inevitably used the
And why wouldn't you? They are much more convenient than using the Java Regular Expressions API where you need to create a '
Pattern' object, and possibly a '
Matcher', and then call methods on those.
However, all convenience comes at a price!
The Evil Inside
In this case, the
String.replace*() methods (with the sole exception of
String.replace(char, char) ) internally use the regular expression apis themselves, which can result in performance issues for your application.
Here is the
Notice that each call to
String.split() creates and compiles a new
Pattern object. The same is true for the
String.replace() methods. This compiling of a pattern each time can cause performance issues in your program if you call the
replace() functions in a tight loop.
I tried a very simple test case to see how much the performance is affected.The first case used
String.split()a million times:
In the second case, I just changed the loop to use a precompiled
Here are the average results of 6 test runs:
Time taken with
String.split() : 1600ms
Time taken with precompiled
Pattern object: 1195 ms
Note that I used an extremely simple regular expression here which consists of just a single 'space' character and it resulted in > 25% decrease in performance.
A longer more complex expression would take longer to compile and thus make the loop containing the split() method even slower compared to its counterpart.
Lesson learned: It is good to know the internals of the APIs you use. Sometimes the convenience comes at the price of a hidden evil which may come to bite you when you are not looking.