Unicode in Javascript: Closure Compiler blunders

1
> Uncaught SyntaxError: Unexpected token ,

When coding in JavaScript, as an english speaking citizen, unicode is rarely something that I think of. Recently though, I attempted to upgrade an Underscore package to version 1.3.3. This didn’t turn out as well as I hoped, I was immediately met with errors in the Javascript caused by using Closure Compiler in UTF charset mode. This could be avoided in the Underscore code, but it’s not entirely something they are to blame for. It has more to do with encoding a javascript file that has unicode literals in the code. When this is done, the u sequence becomes encoded and could possibly break your code, as I will demonstrate.

It begins with Underscore 1.3.3. Around lines 929-937 in the developer version is this code:

1
2
3
4
5
6
7
8
9
var escapes = {
'\': '\',
"'": "'",
'r': 'r',
'n': 'n',
't': 't',
'u2028': 'u2028',
'u2029': 'u2029'
};

When ran non-literally this code is fine. However, if this file has been encoded. the ‘u2028’ literally reads ‘PARAGRAPH_BREAK’ which is not a valid value for a variable in an object literal; it will tell you there is an unexpected , in this case.
For me, this showed up in my build process. I tend to concatenate my JS and serve it as a single file; decreasing http calls and speeding up a websites loading. For this I use Google’s Closure Compiler. I also like to compile my language files into the mix, and these by definition need to be in Unicode. However, something curious occurs when running the next command.

1
java -jar compiler.jar --js underscore.js --js_output_file underscore-compressed.js --charset utf-8

You guessed it, you no longer have valid Javascript!

Avoiding this is easy, and I suggested such to underscore directly but really it’s not required. Try to always read JavaScript UTF escape sequences (u) literally, instead of treating them like strings.

1
2
3
4
5
6
7
8
9
var escapes = {
'\': '\',
"'": "'",
'r': 'r',
'n': 'n',
't': 't',
'u2028': String.fromCharCode(0x2028),
'u2029': String.fromCharCode(0x2029)
};

A bug report has been filed with closure compiler, but I’m not sure that’s something closure will want to deal with. Bug Report
Update: This bug has been fixed.

Sources:

1. https://github.com/documentcloud/underscore/issues/579
2. http://documentcloud.github.com/underscore/
3. https://developers.google.com/closure/compiler/
4. http://code.google.com/p/closure-templates/issues/detail?id=52