Sets the current character encoding for output to the specified string. The default character encoding for output is ISO-8859-1. The current character encoding can be discovered by using the getCharSet function.
:-) setCharSet( "UTF-8" );
There are 0 results
The available character encodings depend on the version of Java you are running MillScript in. The following table illustrates the supported character encodings when using Java 1.4.2. When setting the character encoding, you can use the canonical name or any of the aliases. When determining the current output character encoding via getCharSet the canonical name will be returned.
| Canonical Name | Aliases |
|---|---|
| Big5 | csBig5 |
| Big5-HKSCS | big5-hkscs, Big5_HKSCS, big5hkscs |
| EUC-JP | eucjis, x-eucjp, csEUCPkdFmtjapanese, eucjp, Extended_UNIX_Code_Packed_Format_for_Japanese, x-euc-jp, euc_jp |
| EUC-KR | ksc5601, 5601, ksc5601_1987, ksc_5601, ksc5601-1987, euc_kr, ks_c_5601-1987, euckr, csEUCKR |
| GB18030 | gb18030-2000 |
| GBK | windows-936, CP936 |
| ISO-2022-JP | jis, jis_encoding, csjisencoding, csISO2022JP, iso2022jp |
| ISO-2022-KR | ISO2022KR, csISO2022KR |
| ISO-8859-1 | iso-ir-100, 8859_1, ISO_8859-1, ISO8859_1, 819, csISOLatin1, IBM-819, ISO_8859-1:1987, latin1, cp819, ISO8859-1, IBM819, ISO_8859_1, l1 |
| ISO-8859-13 | iso8859_13 |
| ISO-8859-15 | 8859_15, csISOlatin9, IBM923, cp923, 923, L9, IBM-923, ISO8859-15, LATIN9, ISO_8859-15, LATIN0, csISOlatin0, ISO8859_15_FDIS, ISO-8859-15 |
| ISO-8859-2 | l2, iso-ir-101, ISO_8859-2:1987, ISO_8859-2, latin2, csISOLatin2, iso8859_2 |
| ISO-8859-3 | |
| ISO-8859-4 | iso-ir-110, l4, latin4, csISOLatin4, iso8859_4, ISO_8859-4:1988, ISO_8859-4 |
| ISO-8859-5 | cyrillic, iso8859_5, ISO_8859-5, iso-ir-144, csISOLatinCyrillic |
| ISO-8859-6 | |
| ISO-8859-7 | greek8, ECMA-118, sun_eu_greek, ELOT_928, ISO_8859-7:1987, iso-ir-126, ISO_8859-7, iso8859_7, greek, csISOLatinGreek |
| ISO-8859-8 | |
| ISO-8859-9 | iso-ir-148, latin5, l5, ISO_8859-9, ISO_8859-9:1989, csISOLatin5, iso8859_9 |
| JIS_X0201 | JIS_X0201, X0201, JIS0201, csHalfWidthKatakana |
| JIS_X0212-1990 | jis_x0212-1990, iso-ir-159, x0212, JIS0212, csISO159JISX02121990 |
| KOI8-R | koi8, cskoi8r |
| Shift_JIS | shift-jis, x-sjis, ms_kanji, shift_jis, csShiftJIS, sjis, pck |
| TIS-620 | |
| US-ASCII | ISO646-US, IBM367, ASCII, cp367, ascii7, ANSI_X3.4-1986, iso-ir-6, us, 646, iso_646.irv:1983, csASCII, ANSI_X3.4-1968, ISO_646.irv:1991 |
| UTF-16 | UTF_16 |
| UTF-16BE | X-UTF-16BE, UTF_16BE, ISO-10646-UCS-2 |
| UTF-16LE | UTF_16LE, X-UTF-16LE |
| UTF-8 | UTF8 |
| windows-1250 | cp1250 |
| windows-1251 | cp1251 |
| windows-1252 | cp1252 |
| windows-1253 | cp1253 |
| windows-1254 | cp1254 |
| windows-1255 | |
| windows-1256 | |
| windows-1257 | cp1257 |
| windows-1258 | |
| windows-31j | csWindows31J, windows-932, MS932 |
| x-EUC-CN | gb2312, EUC_CN, euccn, euc-cn, gb2312-80, gb2312-1980 |
| x-euc-jp-linux | euc_jp_linux, euc-jp-linux |
| x-EUC-TW | cns11643, euc_tw, EUC-TW, euctw |
| x-ISCII91 | iscii, ST_SEV_358-88, iso-ir-153, csISO153GOST1976874, ISCII91 |
| x-JIS0208 | JIS_C6626-1983, JIS0208, csISO87JISX0208, x0208, JIS_X0208-1983, iso-ir-87 |
| x-Johab | johab, ms1361, ksc5601-1992, ksc5601_1992 |
| x-MS950-HKSCS | MS950_HKSCS |
| x-mswin-936 | ms936, ms_936 |
| x-windows-949 | windows949, ms_949, ms949 |
| x-windows-950 | windows-950, ms950 |
The default character encoding of IS0-8859-1 was in part chosen on the basis of it being the default HTML character encoding, but it isn't!, it's actually mentioned as the default character encoding in the HTTP specification, Hypertext Transfer Protocol -- HTTP/1.1.
ISO-8859-1 made a sensible choice for the types and languages of document we produced at the time. It could easily be argued that this should be changed to UTF-8 or another Unicode character encoding.
The following section of code can be, and indeed was, used to generate the contents of the supported encoding table.
:-) var availableCharsets = javaFunction( "java.nio.charset.Charset", "availableCharsets" );
:-) var aliases = javaFunction( "java.nio.charset.Charset", "aliases" );
for name & charset in availableCharsets() do
<tr>
<td>name</td>
<td>
for alias in charset.aliases; c from 1 do
alias,
if charset.aliases.length /= c then
", "
endif
endfor
</td>
</tr>
endfor;