Post by Stefan Monnier
Clearly this problem is not specific to Emacs, so what do people do?
Hold on to iso-2022 for as long as they can (like we do in Emacs)?
Give up on these "details" of rendering for files using a mix of C, J, and K?
Rely on higher-level info (XML tags and friends) to carry the charset info?
For most uses, people typically just use UTF-8 and give up on the
details, which tend to be in areas that many users don't care much about
anyway. In practice if (say) a Japanese reader sees a Chinese quotation
in a page of Japanese text, there's an excellent chance the reader won't
much mind that the Chinese characters are rendered in Japanese-style, as
this has long been common practice in Japanese printing anyway.
There are of course exceptions where it really matters which font you
use, such as the Wikipedia page on Chinese character variants that
Clément mentioned. But these are rare, and are typically handled by
means other than plain text. It's like the Wikipedia page on kerning,
which uses images rather than plain UTF-8 text to illustrate how to kern
I mildly prefer multilingual text to be rendered in a consistent style
for my language, as opposed to having it rendered separately for readers
of each of its component languages, as this makes the text a bit easier
for me to read (which is the point of text, isn't it?). But this of
course is merely a style preference.
For what it's worth, the April 2018 w3techs.com numbers say that UTF-8
is used by 91.3% of websites whose character encoding they know, and
that this number is steadily growing (it was 88.9% a year ago). In
contrast, ISO 2022 usage is declining steadily. Of course the web is not
the entire universe; still, it's pretty clear which way the world is going.