Python内置了很多编码的字符集处理,有些是使用C语言实现,有些是使用字典映射方式实现。下表按名称排序的字符集表,有些名称是可以别的名称的,比如utf-8也可以使用名称utf_8来表查找。CPython实现与其它实现有一些差别,针对一些编码字符集作了优化,如果使用这些字符集之外的字符集可能速度比较慢。优化的字符集:utf-8, utf8, latin-1, latin1, iso-8859-1, mbcs (Windows only), ascii, utf-16, and utf-32。有些字符集支持不同的语言,也有一些独立的字符集。
Codec |
Aliases |
Languages |
ascii |
646, us-ascii |
English |
big5 |
big5-tw, csbig5 |
Traditional Chinese |
big5hkscs |
big5-hkscs, hkscs |
Traditional Chinese |
cp037 |
IBM037, IBM039 |
English |
cp273 |
273, IBM273, csIBM273 |
German New in version 3.4. |
cp424 |
EBCDIC-CP-HE, IBM424 |
Hebrew |
cp437 |
437, IBM437 |
English |
cp500 |
EBCDIC-CP-BE, EBCDIC-CP-CH, IBM500 |
Western Europe |
cp720 |
Arabic |
|
cp737 |
Greek |
|
cp775 |
IBM775 |
Baltic languages |
cp850 |
850, IBM850 |
Western Europe |
cp852 |
852, IBM852 |
Central and Eastern Europe |
cp855 |
855, IBM855 |
Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp856 |
Hebrew |
|
cp857 |
857, IBM857 |
Turkish |
cp858 |
858, IBM858 |
Western Europe |
cp860 |
860, IBM860 |
Portuguese |
cp861 |
861, CP-IS, IBM861 |
Icelandic |
cp862 |
862, IBM862 |
Hebrew |
cp863 |
863, IBM863 |
Canadian |
cp864 |
IBM864 |
Arabic |
cp865 |
865, IBM865 |
Danish, Norwegian |
cp866 |
866, IBM866 |
Russian |
cp869 |
869, CP-GR, IBM869 |
Greek |
cp874 |
Thai |
|
cp875 |
Greek |
|
cp932 |
932, ms932, mskanji, ms-kanji |
Japanese |
cp949 |
949, ms949, uhc |
Korean |
cp950 |
950, ms950 |
Traditional Chinese |
cp1006 |
Urdu |
|
cp1026 |
ibm1026 |
Turkish |
cp1125 |
1125, ibm1125, cp866u, ruscii |
Ukrainian New in version 3.4. |
cp1140 |
ibm1140 |
Western Europe |
cp1250 |
windows-1250 |
Central and Eastern Europe |
cp1251 |
windows-1251 |
Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
cp1252 |
windows-1252 |
Western Europe |
cp1253 |
windows-1253 |
Greek |
cp1254 |
windows-1254 |
Turkish |
cp1255 |
windows-1255 |
Hebrew |
cp1256 |
windows-1256 |
Arabic |
cp1257 |
windows-1257 |
Baltic languages |
cp1258 |
windows-1258 |
Vietnamese |
cp65001 |
Windows only: Windows UTF-8 (CP_UTF8) New in version 3.3. |
|
euc_jp |
eucjp, ujis, u-jis |
Japanese |
euc_jis_2004 |
jisx0213, eucjis2004 |
Japanese |
euc_jisx0213 |
eucjisx0213 |
Japanese |
euc_kr |
euckr, korean, ksc5601, ks_c-5601, ks_c-5601-1987, ksx1001, ks_x-1001 |
Korean |
gb2312 |
chinese, csiso58gb231280, euc- cn, euccn, eucgb2312-cn, gb2312-1980, gb2312-80, iso- ir-58 |
Simplified Chinese |
gbk |
936, cp936, ms936 |
Unified Chinese |
gb18030 |
gb18030-2000 |
Unified Chinese |
hz |
hzgb, hz-gb, hz-gb-2312 |
Simplified Chinese |
iso2022_jp |
csiso2022jp, iso2022jp, iso-2022-jp |
Japanese |
iso2022_jp_1 |
iso2022jp-1, iso-2022-jp-1 |
Japanese |
iso2022_jp_2 |
iso2022jp-2, iso-2022-jp-2 |
Japanese, Korean, Simplified Chinese, Western Europe, Greek |
iso2022_jp_2004 |
iso2022jp-2004, iso-2022-jp-2004 |
Japanese |
iso2022_jp_3 |
iso2022jp-3, iso-2022-jp-3 |
Japanese |
iso2022_jp_ext |
iso2022jp-ext, iso-2022-jp-ext |
Japanese |
iso2022_kr |
csiso2022kr, iso2022kr, iso-2022-kr |
Korean |
latin_1 |
iso-8859-1, iso8859-1, 8859, cp819, latin, latin1, L1 |
West Europe |
iso8859_2 |
iso-8859-2, latin2, L2 |
Central and Eastern Europe |
iso8859_3 |
iso-8859-3, latin3, L3 |
Esperanto, Maltese |
iso8859_4 |
iso-8859-4, latin4, L4 |
Baltic languages |
iso8859_5 |
iso-8859-5, cyrillic |
Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
iso8859_6 |
iso-8859-6, arabic |
Arabic |
iso8859_7 |
iso-8859-7, greek, greek8 |
Greek |
iso8859_8 |
iso-8859-8, hebrew |
Hebrew |
iso8859_9 |
iso-8859-9, latin5, L5 |
Turkish |
iso8859_10 |
iso-8859-10, latin6, L6 |
Nordic languages |
iso8859_13 |
iso-8859-13, latin7, L7 |
Baltic languages |
iso8859_14 |
iso-8859-14, latin8, L8 |
Celtic languages |
iso8859_15 |
iso-8859-15, latin9, L9 |
Western Europe |
iso8859_16 |
iso-8859-16, latin10, L10 |
South-Eastern Europe |
johab |
cp1361, ms1361 |
Korean |
koi8_r |
Russian |
|
koi8_u |
Ukrainian |
|
mac_cyrillic |
maccyrillic |
Bulgarian, Byelorussian, Macedonian, Russian, Serbian |
mac_greek |
macgreek |
Greek |
mac_iceland |
maciceland |
Icelandic |
mac_latin2 |
maclatin2, maccentraleurope |
Central and Eastern Europe |
mac_roman |
macroman, macintosh |
Western Europe |
mac_turkish |
macturkish |
Turkish |
ptcp154 |
csptcp154, pt154, cp154, cyrillic-asian |
Kazakh |
shift_jis |
csshiftjis, shiftjis, sjis, s_jis |
Japanese |
shift_jis_2004 |
shiftjis2004, sjis_2004, sjis2004 |
Japanese |
shift_jisx0213 |
shiftjisx0213, sjisx0213, s_jisx0213 |
Japanese |
utf_32 |
U32, utf32 |
all languages |
utf_32_be |
UTF-32BE |
all languages |
utf_32_le |
UTF-32LE |
all languages |
utf_16 |
U16, utf16 |
all languages |
utf_16_be |
UTF-16BE |
all languages |
utf_16_le |
UTF-16LE |
all languages |
utf_7 |
U7, unicode-1-1-utf-7 |
all languages |
utf_8 |
U8, UTF, utf8 |
all languages |
utf_8_sig |
all languages |
蔡军生 QQ:9073204 深圳
版权声明:本文为博主原创文章,未经博主允许不得转载。
时间: 2024-10-06 16:27:30