Is Unicode in the code taboo?

MzScheme supports UTF-8 encoded files. Combine that with DrScheme, which makes it pretty easy to type in Unicode symbols, and somewhat suddenly, and surprisingly, you have the opportunity to work with symbols in your code beyond the standard 95 character ASCII set that we all know and love. What are the implications to you as a programmer?
The simplest implication is that you now have the ability to work with 100,000+ characters. In case you felt limited by the inability to use the characters of your native tongue like Tamil or perhaps Braille, you are restricted no more. Scientific programmers may enjoy using Greek letters; and who of us wouldn’t like to use the letter π to represent Pi? For Schemers, perhaps you would use → rather than ->.
The common theme among these atypical examples is that they facilitate communication. Without getting into the deep theories and concepts behind the value of communication and how the limitations of a language affect it; I would share a quote relevant to us as programmers regarding how we communicate to each other with our code:

Let us change our traditional attitude to the construction of programs: Instead of imagining that our main task is to instruct a computer what to do, let us concentrate rather on explaining to human beings what we want a computer to do.

— Donald Knuth
It seems like Unicode might facilitate that communication, but Unicode in code today is not at all common. Why might that be?
The usual suspects are that our tooling (IDEs and text-processing tools) doesn’t support Unicode well. Perhaps that is the case, but I am had pressed to believe that if people thought it had any value they wouldn’t simply add support to their tooling for Unicode. Fonts seem to be the biggest practical issue; lack of a supporting font usually results in an ugly box in place of the character. In the end I suspect that the worst culprit here is simply that Unicode in the code is a taboo: people simply won’t give it a try until a thought leader or two sheds light on the power to facilitate communication that Unicode brings.
Until then, you will have to be happy with Emacs and DrScheme.

3 thoughts on “Is Unicode in the code taboo?”

zimbatm says:

2008-12-31 at 07:56

Even if using unicode doesn’t necessarily guarantee better communication like you’re implying, I believe that the biggest problem for adoption is developer input. I am not willing to scroll in a 100’000+ char table or buying an APL keyboard to enter those symbols.
Unless the editor supports some kind of textual to symbol transformation, unicode will be kept for comments and user input.

Reply
Grant says:

2008-12-31 at 11:17

Hi zimbatm:
While you probably wouldn’t want to work with 100,000+ symbols all at the same time, it is still an issue. Assuming that Unicode does add value, the tooling must be there: APL keyboards and 100,000 cell table choosers are not going to work. I haven’t thought through what the typical developer use case would be, but for sake of conversation I would assume that one would use a few symbols here and there basically chosen piecemeal from Unicode charts (this describes my own usage). With that in mind, I can tell you how you can handle it in DrScheme.
In the DrScheme IDE, it is pretty easy to transform a Unicode code into its symbol. In the Interactions window (aka the REPL), you can enter in the Unicode value “\u2665” and have ♥ displayed as a result. Once I had more than a few symbols about which I cared I would write a plugin to provide some kind of a mechanism to make it easy to choose the symbols about which I cared.
Perhaps a better place for me to start would be to create some keybindings, or even write a simple plugin to convert Unicode codes typed into the code to their symbol.

Reply
Robert Fisher says:

2009-01-13 at 13:12

Actually, most of my tools are Unicode savvy enough. I have fonts with the characters I’d like to use. There are programs that we convert specific key sequences into specific characters for me.
The things that hold use back are (1) people who tend to move slower than technology and (2) although all the pieces may be there, the edifice you construct out of them may still have a bit of fragility.

Reply

Is Unicode in the code taboo?

You might also like some of these

3 thoughts on “Is Unicode in the code taboo?”

Leave a Reply to Grant Cancel reply