How can system locale break your java application character encoding

Today my colleague encountered some weried character encoding problem of a java application. The application runs with out any problem on a mac, but shows weried characters when runs on ubuntu.

After some investigation, we found out that the locale on the ubuntu machine is set wrongly. But how can the system locale have any influence on how java encodes or deocdes? It turns out that some APIs(e.g., the default constructor of InputStreamReader, String.getBytes()) in java have parameters for setting character encoding. Those parameters default to JVM’s file.encoding property if not explicitly specified. And JVM’s file.encoding property gets default value according to JVM locale. JVM locale reads system locale as the default value when JVM starts up.

The lesson we learned here is that explicit is better than implicit. Default parameters might be sometimes convenient, but they can also set your hair on fire.