Avoid backslashes anywhere in Java code (Java error “illegal unicode escape”)

Did you know you can insert unicode-escaped characters, anywhere in a Java program?

Most of us are familiar with using unicode escapes like this:

String pound = "\u00A3";

but in fact constructs like \u00A3 can go anywhere, including in a comment.

This is all fine so long as they’re valid, but what if you’re generating Java code without due care and attention?

And what if you’re inserting file paths into your generated code?

And what if one of your directories has a name starting with a “u”?

Then you get code like this (edited example from our real project!):

// DO NOT EDIT!  This file was generated from:
// C:\usr\foo.xml

And, only on the machine with the dir called “usr”, we got a compile error like this:

MyClass.java:14: illegal unicode escape
// C:\usr\foo.xml
       ^
1 error

Which took me a while to track down.

Reference: JLS-3.3 Unicode Escapes.

Note this only applies to unicode escapes, not others like \n or \t – they are only processed within character or string literals (JLS-3.10.6).

Bash arrays

Bash arrays are a lot like Bash Associative Arrays, but with numbers as keys.

Here’s a quick reference.

Basics

$ declare -a MYARR  # Create an array
$ MYARR[3]=foo      # Put a value into an array
$ echo ${MYARR[3]}  # Get a value out of an array
foo
$ echo MYARR[3]     # WRONG
MYARR[0]
$ echo $MYARR[3]]   # WRONG
[3]

Creating, adding

$ declare -a MYARR    # Explicitly declare
$ MYARR[3]=foo        # Or this line implicitly makes it an array
$ MYARR[4]=bar        # Can add values one by one
$ declare -a MYARR=(a b c)   # Initialise all at once
$ echo ${MYARR[0]}
a
$ echo ${MYARR[1]}
b
$ echo ${MYARR[2]}
c
$ declare -a MYARR   # Or declare separately
$ MYARR=(a b c)      # Then initialise
$ echo ${MYARR[0]}
a
$ echo ${MYARR[1]}
b
$ echo ${MYARR[2]}
c
$ declare -a MYARR=(a b c)
$ MYARR=("${MYARR[@]}" d)  # Add an element
$ echo ${MYARR[@]}
a b c d
$ declare -a MYARR2=(e f g)
$ MYARR=("${MYARR[@]}" "${MYARR2[@]}")  # Concatenate arrays
$ echo ${MYARR[@]}
a b c d e f g

Keys/Indices

$ declare -a MYARR
$ MYARR[3]=foo
$ echo ${MYARR[0]}  # Unassigned values are empty

$ echo ${MYARR[4]}  # Unassigned values are empty

$ MYARR[seven]=bar     # A text index is treated as 0
$ echo ${MYARR[0]}
bar
$ echo ${MYARR[seven]} # A text index is treated as 0
bar
$ K=3
$ MYARR[$K]=baz      # Variables containing numbers work like numbers
$ echo ${MYARR[$K]}
baz
$ echo ${MYARR[3]}   # Obviously the value is accessible via the actual index
baz
$ K=foo
$ MYARR[$K]=bash     # Variables containing text are treated as 0
$ echo ${MYARR[0]}
bash

Length

$ declare -a MYARR=(a b c)
$ echo ${#MYARR[@]}  # Length of an array
3
$ echo $#MYARR[@]  # WRONG
0MYARR[@]
$ echo ${#MYARR}   # WRONG
1
$ MYARR[7]=x
$ echo ${#MYARR[@]}  # Only existing indices count in the length
4
$ declare -a MYARR=(a bb ccc)
$ echo ${#MYARR[0]}   # Length of an individual element
1
$ echo ${#MYARR[1]}
2
$ echo ${#MYARR[2]}
3

Looping

$ declare -a MYARR=("a 1" b c)
$ # Loop through array values
$ for V in "${MYARR[@]}"; do echo $V; done
a 1
b
c
$ for V in ${MYARR[@]}; do echo $V; done  #WRONG
a
1
b
c
$ echo "${!MYARR[@]}"  # Print all indices - quoted, but quotes removed by echo
0 1 2
$ echo "${MYARR[@]}"   # Print all values - quoted, but quotes removed by echo
a 1 b c

Clearing

$ declare -a MYARR
$ MYARR[3]=x

$ echo ${MYARR[3]}
x
$ unset MYARR
$ declare -a MYARR
$ echo ${MYARR[3]}

Deleting

$ MYARR[2]=foo
$ echo ${MYARR[2]}
foo
$ unset ${MYARR[2]} # WRONG
$ echo ${MYARR[2]}
foo
$ unset MYARR[2]    # To delete from an array, use "unset" with similar syntax to assigning
$ echo ${MYARR[2]}

$ MYARR[3]=quux
$ echo ${MYARR[3]}
quux
$ K=3
$ unset MYARR[$K]   # Can unset using a variable for the key too
$ echo ${MYARR[3]}

$ declare -a MYARR=(a b c d e f)
$ MYARR=("${MYARR[@]:0:3}" "${MYARR[@]:4}")  # Remove element 3, leaving no gap
$ echo ${MYARR[@]}

Cool stuff

$ declare -a MYARR=(a b c d e f g)
$ echo ${MYARR[@]:2:3}              # Extract a sub-array
c d e
$ declare -a MYARR=(a b c d e f g)
$ echo ${MYARR[@]/d/FOO}            # Replace elements that match
a b c FOO e f g

Scope

$ unset MYARR
$ function createmap() { MYARR[5]=bar; }  # Implicit creation puts it in the global scope
$ echo ${MYARR[5]}

$ createmap
$ echo ${MYARR[5]}
bar
$ unset MYARR
$ function createmaplocal() { declare -a MYARR; MYARR[3]=bar; }  # Explicit creation puts it in the local scope
$ echo ${MYARR[3]}

$ createmaplocal
$ echo ${MYARR[3]}

Links