setCharSet( string )

Sets the current character encoding for output to the specified string. The default character encoding for output is ISO-8859-1. The current character encoding can be discovered by using the getCharSet function.

:-) setCharSet( "UTF-8" );
There are 0 results
      

Supported Encodings

The available character encodings depend on the version of Java you are running MillScript in. The following table illustrates the supported character encodings when using Java 1.4.2. When setting the character encoding, you can use the canonical name or any of the aliases. When determining the current output character encoding via getCharSet the canonical name will be returned.

Canonical Name Aliases
Big5csBig5
Big5-HKSCSbig5-hkscs, Big5_HKSCS, big5hkscs
EUC-JPeucjis, x-eucjp, csEUCPkdFmtjapanese, eucjp, Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc-jp, euc_jp
EUC-KRksc5601, 5601, ksc5601_1987, ksc_5601, ksc5601-1987, euc_kr, ks_c_5601-1987, euckr, csEUCKR
GB18030gb18030-2000
GBKwindows-936, CP936
ISO-2022-JPjis, jis_encoding, csjisencoding, csISO2022JP, iso2022jp
ISO-2022-KRISO2022KR, csISO2022KR
ISO-8859-1iso-ir-100, 8859_1, ISO_8859-1, ISO8859_1, 819, csISOLatin1, IBM-819, ISO_8859-1:1987, latin1, cp819, ISO8859-1, IBM819, ISO_8859_1, l1
ISO-8859-13iso8859_13
ISO-8859-158859_15, csISOlatin9, IBM923, cp923, 923, L9, IBM-923, ISO8859-15, LATIN9, ISO_8859-15, LATIN0, csISOlatin0, ISO8859_15_FDIS, ISO-8859-15
ISO-8859-2l2, iso-ir-101, ISO_8859-2:1987, ISO_8859-2, latin2, csISOLatin2, iso8859_2
ISO-8859-3
ISO-8859-4iso-ir-110, l4, latin4, csISOLatin4, iso8859_4, ISO_8859-4:1988, ISO_8859-4
ISO-8859-5cyrillic, iso8859_5, ISO_8859-5, iso-ir-144, csISOLatinCyrillic
ISO-8859-6
ISO-8859-7greek8, ECMA-118, sun_eu_greek, ELOT_928, ISO_8859-7:1987, iso-ir-126, ISO_8859-7, iso8859_7, greek, csISOLatinGreek
ISO-8859-8
ISO-8859-9iso-ir-148, latin5, l5, ISO_8859-9, ISO_8859-9:1989, csISOLatin5, iso8859_9
JIS_X0201JIS_X0201, X0201, JIS0201, csHalfWidthKatakana
JIS_X0212-1990jis_x0212-1990, iso-ir-159, x0212, JIS0212, csISO159JISX02121990
KOI8-Rkoi8, cskoi8r
Shift_JISshift-jis, x-sjis, ms_kanji, shift_jis, csShiftJIS, sjis, pck
TIS-620
US-ASCIIISO646-US, IBM367, ASCII, cp367, ascii7, ANSI_X3.4-1986, iso-ir-6, us, 646, iso_646.irv:1983, csASCII, ANSI_X3.4-1968, ISO_646.irv:1991
UTF-16UTF_16
UTF-16BEX-UTF-16BE, UTF_16BE, ISO-10646-UCS-2
UTF-16LEUTF_16LE, X-UTF-16LE
UTF-8UTF8
windows-1250cp1250
windows-1251cp1251
windows-1252cp1252
windows-1253cp1253
windows-1254cp1254
windows-1255
windows-1256
windows-1257cp1257
windows-1258
windows-31jcsWindows31J, windows-932, MS932
x-EUC-CNgb2312, EUC_CN, euccn, euc-cn, gb2312-80, gb2312-1980
x-euc-jp-linuxeuc_jp_linux, euc-jp-linux
x-EUC-TWcns11643, euc_tw, EUC-TW, euctw
x-ISCII91iscii, ST_SEV_358-88, iso-ir-153, csISO153GOST1976874, ISCII91
x-JIS0208JIS_C6626-1983, JIS0208, csISO87JISX0208, x0208, JIS_X0208-1983, iso-ir-87
x-Johabjohab, ms1361, ksc5601-1992, ksc5601_1992
x-MS950-HKSCSMS950_HKSCS
x-mswin-936ms936, ms_936
x-windows-949windows949, ms_949, ms949
x-windows-950windows-950, ms950

Notes

The default character encoding of IS0-8859-1 was in part chosen on the basis of it being the default HTML character encoding, but it isn't!, it's actually mentioned as the default character encoding in the HTTP specification, Hypertext Transfer Protocol -- HTTP/1.1.

ISO-8859-1 made a sensible choice for the types and languages of document we produced at the time. It could easily be argued that this should be changed to UTF-8 or another Unicode character encoding.

The following section of code can be, and indeed was, used to generate the contents of the supported encoding table.

:-) var availableCharsets = javaFunction( "java.nio.charset.Charset", "availableCharsets" );
:-) var aliases = javaFunction( "java.nio.charset.Charset", "aliases" );
for name & charset in availableCharsets() do
  <tr>
    <td>name</td>
    <td>
      for alias in charset.aliases; c from 1 do
        alias,
        if charset.aliases.length /= c then
          ", "
        endif
      endfor
    </td>
  </tr>
endfor;