So a few weeks ago, I was reading this essay by Ramsey Nasser. I’ve debated back and forth several times as to whether Smile should keep its strings and identifiers as 8-bit characters, or whether they should be upgraded to full Unicode. It’s a tough question.
On the one hand, I’m trying to build a language that will be able to grow well with the needs of the future, and the future argues for Unicode. There are a lot of people out there, and not all of them speak English. Or read or write English. As Nasser notes, Arabic is poorly supported by, well, everything, and there’s only a few bajillion people out there speaking Arabic. (It so happens that Arabic is possibly a pathologically-bad worst case for programming language support, too, since it’s a proportional cursive writing system and not fixed-with, with concepts like initial, medial, and final forms, instead of a single letterform per phoneme, and for the icing on the cake, it writes in the dead opposite direction of most other languages on planet Earth.) And beyond Arabic there’s Chinese and Russian and Devanagari and Japanese and a thousand others, and a really good future-proof programming language ought to be able to support all that in a very natural, native kind of way.