https://modelviewculture.com/pieces/i-can-text-you-a-pile-of-poo-but-i-cant-write-my-name
By Aditya Mukerjee
(…)
"The founders and current members of the Unicode Consortium are well-credentialed. Collins has an M.A. from Columbia University in East Asian languages and cultures. Becker’s biography notes that he “speaks survival-level Chinese, French, German, Japanese, and Russian, and has forgotten Latin.” (source)
"Despite this, we can’t ignore the composition of the Consortium’s members, directors, and officers, the people who define the everyday writing systems of all languages across the globe. They are comprised largely of white men (and a few white women) whose first language was either English or another European language.
"Many of them work for one of the nine organizations that hold full membership in the Consortium. Seven of these nine are US-based technology companies: Adobe, Apple, Google, IBM, Microsoft, Oracle, and Yahoo. One (SAP) is a German technology company. These companies have, by their very own admission, workforces that are overwhelmingly white, and leadership and tech teams that are even less represented by racial minorities. These reports lump Indians together under the broad umbrella “Asians”, so while it’s impossible to know exactly how badly various ethnic groups or native languages are actually represented, the data is far from encouraging.
"The only other full member from the rest of the world is Oman’s Ministry of Endowment and Religious Affairs. Bizarrely, they are represented in the Consortium not by an Omani man, but by a Dutch man with only “professional working proficiency” in Arabic and a “full professional proficiency” in English, in addition to his native Dutch.
"My family’s native language, which I grew up speaking, is far from a niche language. Bengali is the seventh most common native language in the world, sitting ahead of the eighth (Russian) by a wide margin, with as many native speakers as French, German, and Italian combined.
"And yet, on the Internet, Bengali is very much a second-class citizen – as are Arabic (#5), Hindi (#4), and Mandarin (#1) – any language which is not written with the Latin alphabet.
"The very first version of the Unicode standard did include Bengali. However, it left out a number of important characters. Until 2005, Unicode did not have one of the characters in the Bengali word for “suddenly”. Instead, people who wanted to write this everyday word had to combine three separate, unrelated characters. For English-speaking teenagers, combining characters in unexpected ways, like writing ‘w’ as ‘//’, used to be a way of asserting technical literacy through “l33tspeak” – a shibboleth for nerds that derives its name from the word “elite”. But Bengalis were forced to make similar orthographic contortions just to write a simple email: ত + ্ + = ৎ (the third character is the invisible “zero width joiner”).
"Even today, I am forced to do this when writing my own name. My name is not only a common Indian name, but one of the top 1,000 names in the United States as well. But the final letter has still not been given its own Unicode character, so I have to use a substitute.
"A few other characters that were more common historically, though still used today, were also missing for the first decade of Bengali’s existence in Unicode. It’s tempting to argue that historical characters have no place in a character set intended for computers. On the contrary, this makes their inclusion even more vital: rendering historical texts accurately is key to ensuring their survival in the transition to the age of digital media.
"Furthermore, these characters are still common enough that they were printed in the Bengali reading textbooks and workbooks that we used growing up. Omitting them literally ensures that existing materials for learning to read Bengali will not be universally accessible.
"Without that argument, one might appeal to the limited space in the Unicode character set. Even if we take for granted the somewhat arbitrary maximum of 1,114,112 codepoints, the other alphabets included speak for themselves. The most recent update to the Unicode standard included the entire alphabet of Linear B, an ancient Mycenaean script that was not deciphered in the modern era until the 1950s. Nor does alleged scarcity explain the inclusion of Linear A, a Minoan script so arcane, scholars disagree on what language it even represented, let alone how to read the script.
"The East Asian Problem
"I am not the only one who has trouble writing their name correctly in Unicode. Linguistically, East Asian languages such as Chinese, Japanese, and Korean have distinct writing systems. Some (but not all) of the characters trace their lineage back to a common set, but even these characters, known as Han characters, began to diverge and evolve independently over two thousand years ago.
"The Unicode Consortium has launched a very controversial project known as Han Unification: an attempt to create a limited set of characters that will be shared by these so-called “CJK languages.” Instead of recognizing these languages as having their own writing systems that share some common ancestry, the Han unification process views them as mere variations on some “true” form.
"To help English readers understand the absurdity of this premise, consider that the Latin alphabet (used by English) and the Cyrillic alphabet (used by Russian) are both derived from Greek. No native English speaker would ever think to try “Greco Unification” and consolidate the English, Russian, German, Swedish, Greek, and other European languages’ alphabets into a single alphabet. Even though many of the letters look similar to Latin characters used in English, nobody would try to use them interchangeably. ҭЋаt ωoulδ βε σutragєѳuѕ.
"Even though our language is exempt from this effort, Han unification is particularly troubling for Bengali speakers to hear about. The rhetoric is a blast from our own colonial past, when the British referred to Indian languages pejoratively as “dialects”. Depriving their colonial subjects of distinct linguistic identities was a key tactic in justifying their brutal rule over an “uncivilized” people…. (((etc etc)))