High Performance MySQL, Third Edition
by Baron Schwartz, Peter Zaitsev, and Vadim Tkachenko
http://dev.mysql.com/doc/refman/5.7/en/charset-general.html
1 DROP TABLE IF EXISTS `w_ci_bin_cs`; 2 CREATE TABLE `w_ci_bin_cs` ( 3 `pkey` int(11) NOT NULL AUTO_INCREMENT, 4 `w` char(255) NOT NULL DEFAULT ‘W‘, 5 `w_ci` char(255) NOT NULL DEFAULT ‘a‘, 6 `w_bin` char(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT ‘bin‘, 7 `w_ci_bin` char(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT ‘ci_bin‘, 8 `w__bin` char(255) CHARACTER SET utf8 COLLATE utf8_bin NOT NULL DEFAULT ‘__bin‘, 9 PRIMARY KEY (`pkey`) 10 ) ENGINE=MyISAM AUTO_INCREMENT=3 DEFAULT CHARSET=utf8; 11 12 -- ---------------------------- 13 -- Records of w_ci_bin_cs 14 -- ---------------------------- 15 INSERT INTO `w_ci_bin_cs` VALUES (‘1‘, ‘w‘, ‘a‘, ‘bin‘, ‘ci_bin‘, ‘__bin‘); 16 INSERT INTO `w_ci_bin_cs` VALUES (‘2‘, ‘W‘, ‘A‘, ‘BIN‘, ‘CI_BIN‘, ‘BIN‘);
1 mysql> SELECT * FROM w_ci_bin_cs; 2 +------+---+------+-------+----------+--------+ 3 | pkey | w | w_ci | w_bin | w_ci_bin | w__bin | 4 +------+---+------+-------+----------+--------+ 5 | 1 | w | a | bin | ci_bin | __bin | 6 | 2 | W | A | BIN | CI_BIN | BIN | 7 +------+---+------+-------+----------+--------+ 8 2 rows in set (0.00 sec) 9 10 mysql> SELECT * FROM w_ci_bin_cs WHERE w=‘w‘; 11 +------+---+------+-------+----------+--------+ 12 | pkey | w | w_ci | w_bin | w_ci_bin | w__bin | 13 +------+---+------+-------+----------+--------+ 14 | 1 | w | a | bin | ci_bin | __bin | 15 | 2 | W | A | BIN | CI_BIN | BIN | 16 +------+---+------+-------+----------+--------+ 17 2 rows in set (0.00 sec) 18 19 mysql> SELECT * FROM w_ci_bin_cs WHERE w_ci=‘a‘; 20 +------+---+------+-------+----------+--------+ 21 | pkey | w | w_ci | w_bin | w_ci_bin | w__bin | 22 +------+---+------+-------+----------+--------+ 23 | 1 | w | a | bin | ci_bin | __bin | 24 | 2 | W | A | BIN | CI_BIN | BIN | 25 +------+---+------+-------+----------+--------+ 26 2 rows in set (0.00 sec) 27 28 mysql> SELECT * FROM w_ci_bin_cs WHERE w_bin=‘BIN‘; 29 +------+---+------+-------+----------+--------+ 30 | pkey | w | w_ci | w_bin | w_ci_bin | w__bin | 31 +------+---+------+-------+----------+--------+ 32 | 2 | W | A | BIN | CI_BIN | BIN | 33 +------+---+------+-------+----------+--------+ 34 1 row in set (0.00 sec) 35 36 mysql>
11.1.1 Character Sets and Collations in General
A character set is a set of symbols and encodings. A collation is a set of rules for comparing characters in a character set. Let‘s make the distinction clear with an example of an imaginary character set.
Suppose that we have an alphabet with four letters: A
, B
, a
, b
. We give each letter a number: A
= 0, B
= 1, a
= 2, b
= 3. The letter A
is a symbol, the number 0 is the encoding for A
, and the combination of all four letters and their encodings is a character set.
Suppose that we want to compare two string values, A
and B
. The simplest way to do this is to look at the encodings: 0 for A
and 1 for B
. Because 0 is less than 1, we say A
is less than B
. What we‘ve just done is apply a collation to our character set. The collation is a set of rules (only one rule in this case): “compare the encodings.” We call this simplest of all possible collations a binary collation.
But what if we want to say that the lowercase and uppercase letters are equivalent? Then we would have at least two rules: (1) treat the lowercase letters a
and b
as equivalent to A
and B
; (2) then compare the encodings. We call this a case-insensitive collation. It is a little more complex than a binary collation.
In real life, most character sets have many characters: not just A
and B
but whole alphabets, sometimes multiple alphabets or eastern writing systems with thousands of characters, along with many special symbols and punctuation marks. Also in real life, most collations have many rules, not just for whether to distinguish lettercase, but also for whether to distinguish accents (an “accent” is a mark attached to a character as in German Ö
), and for multiple-character mappings (such as the rule that Ö
= OE
in one of the two German collations).
MySQL can do these things for you:
- Store strings using a variety of character sets.
- Compare strings using a variety of collations.
- Mix strings with different character sets or collations in the same server, the same database, or even the same table.
- Enable specification of character set and collation at any level.
To use these features effectively, you must know what character sets and collations are available, how to change the defaults, and how they affect the behavior of string operators and functions.
//极简原则 KEEP IT SIMPLE
For sanity’s sake, it’s best to choose sensible defaults on the server level, and perhaps on the database level. Then you can deal with special exceptions on a case-by-case basis, probably at the column level.