It is an accident of history that most information technology, from Morse Code to the Internet, was developed in and for English-speaking countries. English, with just 26 letters and no accents or diacritical marks, means that everything from keyboards to displays to internal character codings are simple, deceptively so, because almost no other language makes life that easy. As a result, developers adopted solutions which have bedeviled information technology in other languages ever since. If developers are being honest, they would probably admit that solutions for languages from French and Russian to Arabic and Thai are (to use the technical term) kludged-up versions of products first designed for English.
Chinese, with its tens of thousands of characters, is however a case apart. How could one even type them, display or store them in computer systems are fascinating questions, taking one down to the basics of what language is, but also of tremendous practical importance in our technically-integrated world. The challenges of contemporary data communications in Chinese is where Jing Tsu’s Kingdom of Characters ends up after a journey that begins in the twilight years of the Qing dynasty.
Tsu starts with the story of how modern “Chinese” was constructed alongside the development of the Chinese nation-state through a deliberate national process of standardization resulting in Putonghua, simplified Chinese characters and pinyin romanization, a process that didn’t end until well into the second half of the 20th century. This particular story is to some extent told better in David Moser’s A Billion Voices: China’s Search for a Common Language, but Tsu is setting up a discussion on technology. She begins back in the 19th century with telegraphy, a tale that is at times almost surreal:
International telegraphy recognized only Roman alphabet letters and Arabic numerals … which meant that Chinese, too, had to be via letters and numbers… Every Chinese character was transmitted as a string of six numbers, each of which cost more than a letter. The assigned code for a Chinese character first had to be looked up in a codebook before being converted to the dots and dashes of Morse code.
Chinese telegraphy was both troublesome and expensive but protecting revenue rather than efficiency seems to have been the International Telegraphic Union’s major priority. Tsu recounts how that since abbreviations cost the telegraphy companies money, they started pricing by word; users as a consequence starting running words together or using code words, practices the ITU tried to stamp out. China’s first foray into international standards diplomacy nevertheless resulted in it being deemed a special case, a partial accommodation but at least a partial success.
Tsu also runs through the by now reasonably well-known story of the Chinese typewriter (told, for example, by Thomas S Mullaney), which as technological quests go, has more than a whiff of the quixotic about it, with an equally interesting detour through typesetting and photo-typesetting before arriving at computers.
The remainder of the book is largely the story of (Greater) China wrestling back control of the standards process for Chinese character set coding. After discussing input methods and incompatible coding standards between different machines, Tsu ends with a description of her attendance at Chinese character standards meetings, which sound simultaneously fascinating and tedious.
The basic problem these meetings tackle is whether variations of a character are the same character or a different one: think “Æ” vs “AE”. This sounds trivial until one thinks about such problems as printing (as many distinct variations as possible) and searching (only meaningful differences should apply). Ironically, one result of the “Simplified Character” standardization process was a rift between “Traditional Characters” (still used in Taiwan and Hong Kong) and those now used in the PRC. Are 國 and 国 the same character, or not? Is the Japanese Kanji character 国 the “same” as the Chinese one? The answer may depend on whether one is a librarian or a typographer. Hence the committee meetings.
While at one level arcane, these are endlessly fascinating questions which cut to the core of what language and communication are. The fuzziness of language is not entirely suited to the binary nature of computerized data.
This interesting and very readable book is however colored by a political framing. Today, Tsu writes, China “is poised to create its own Han script sphere of influence…”, that:
Beyond culture and tradition, Chinese script has been sharpened and upgraded into a technology that is intended to be a first step, a foundation for building an entire ecology of Chinese digital nationalism. China is aiming to reshape global standards, from supply chains to 5G.
This is imbuing the subject with a significance it doesn’t warrant: regardless of what one may think of “Chinese digital nationalism”, a Chinese-specific character-set standard is far narrower in scope than a global telecoms standard like 5G. Information compatibility, as those who work in the field know, is a function of format (WORD vs PDF, for example) as well as character set: one cannot just reach into a file and read it without first knowing what sort of a file it is. And as AI proceeds, the need for character set standards may diminish: if Google Translate can recognize the input language without being told, computers can also recognize the character set.
Occam’s razor might indicate that the century-old discussions at the International Telegraphic Union probably had less to do with Chinese “sovereignty” than the vested financial interests of the main telegraphy companies. And while is indeed that case that the American Standard Code for Information Interchange or
ASCII was never meant to accommodate non-Western script systems containing thousands of ideographic characters …
this was not
due either to its designers’ limited worldview or their failure to imagine the code’s wild success beyond the Western alphabetic world …
ASCII, which dates from the 1960s, wasn’t even “meant to accommodate” French.
Information coding is a far more interesting subject than it would first appear. Ideographic writing systems like Chinese are particularly fascinating … and relevant to anyone who uses an electronic device (emojis—characters with meaning independent of pronunciation—operate not unlike Chinese characters).
Tsu dispels much of the opaqueness of the subject by embedding it into a story of language, characters and particularly fascinating tales of pre-computer technology.