mysql字符集 小结

起因

公司的某个业务需要用到emoji编码,但是很早线上跑的mysql数据库是mysql 5.7.12

模拟

mysql root@127.0.0.1:test_yl_audit> show create table `1_text`;  
+--------+-------------------------------------------------------+
| Table  | Create Table                                          |
+--------+-------------------------------------------------------+
| 1_text | CREATE TABLE `1_text` (                               |
|        |   `id` bigint(20) NOT NULL AUTO_INCREMENT,            |
|        |   `appid` varchar(255) NOT NULL,                      |
|        |   `rid` varchar(255) NOT NULL,                        |
|        |   `type` int(11) NOT NULL,                            |
|        |   `recheck` int(11) NOT NULL,                         |
|        |   `createopertime` int(11) NOT NULL,                  |
|        |   `lastopertime` int(11) NOT NULL,                    |
|        |   `extra` varchar(255) DEFAULT NULL,                  |
|        |   `uid` varchar(255) DEFAULT NULL,                    |
|        |   `cover_url` varchar(255) DEFAULT NULL,              |
|        |   `uname` varchar(255) DEFAULT NULL,                  |
|        |   `status` int(11) NOT NULL,                          |
|        |   `callbackurl` varchar(255) NOT NULL,                |
|        |   `txt` text NOT NULL,                                |
|        |   `title` varchar(255) DEFAULT NULL,                  |
|        |   `encoded` varchar(255) DEFAULT NULL,                |
|        |   `description` varchar(255) NOT NULL,                |
|        |   `module` varchar(255) NOT NULL,                     |
|        |   `oldtxt` text,                                      |
|        |   `createdAt` datetime NOT NULL,                      |
|        |   `updatedAt` datetime NOT NULL,                      |
|        |   PRIMARY KEY (`rid`),                                |
|        |   UNIQUE KEY `id` (`id`),                             |
|        |   UNIQUE KEY `1_text_id_unique` (`id`)                |
|        | ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 |
+--------+-------------------------------------------------------+
1 row in set  
Time: 0.013s  

注意其中txt字段,这个值是 text NOT NULL
首先先尝试 正常更新的数据

mysql root@localhost:test_yl_audit> update `1_text` set txt = "123" where rid = 'xlqiwo001';  
Query OK, 1 row affected  
Time: 0.017s  
mysql root@localhost:test_yl_audit> select * from `1_text`\G;  
***************************[ 1. row ]***************************
id             | 2  
appid          | 1  
rid            | xlqiwo001  
type           | 2  
recheck        | 0  
createopertime | 1502162791  
lastopertime   | 1502162791  
extra          | {"upload_method":2}  
uid            | <null>  
cover_url      | <null>  
uname          | <null>  
status         | 1  
callbackurl    | http://callbackurlresult.com/iwanttokonwresult  
txt            | 123  
title          | <null>  
encoded        | utf-8  
description    | 出世界的光辉1  
module         | MOPageIndex1001  
oldtxt         | <null>  
createdAt      | 2018-03-28 09:24:21  
updatedAt      | 2018-03-28 09:24:21  
1 row in set  
Time: 0.010s  
mysql root@localhost:test_yl_audit>  

当我们更新txt字段包含emoji表情的时候

mysql root@localhost:test_yl_audit> update `1_text` set txt = "123😀" where rid = 'xlqiwo001';  
(1366, u"Incorrect string value: '\\xF0\\x9F\\x98\\x80' for column 'txt' at row 1")

修改表

按照许多网上的做法,我们试试直接修改表的字符集为utf8mb4

// ALTER TABLE `1_text` DEFAULT CHARACTER SET utf8mb4; 不可用这条语句
alter table `1_text` convert to character set  utf8mb4;

// 查看一下表结构  
mysql root@localhost:test_yl_audit> show create table `1_text`;  
+--------+----------------------------------------------------------+
| Table  | Create Table                                             |
+--------+----------------------------------------------------------+
| 1_text | CREATE TABLE `1_text` (                                  |
|        |   `id` bigint(20) NOT NULL AUTO_INCREMENT,               |
|        |   `appid` varchar(255) NOT NULL,                         |
|        |   `rid` varchar(255) NOT NULL,                           |
|        |   `type` int(11) NOT NULL,                               |
|        |   `recheck` int(11) NOT NULL,                            |
|        |   `createopertime` int(11) NOT NULL,                     |
|        |   `lastopertime` int(11) NOT NULL,                       |
|        |   `extra` varchar(255) DEFAULT NULL,                     |
|        |   `uid` varchar(255) DEFAULT NULL,                       |
|        |   `cover_url` varchar(255) DEFAULT NULL,                 |
|        |   `uname` varchar(255) DEFAULT NULL,                     |
|        |   `status` int(11) NOT NULL,                             |
|        |   `callbackurl` varchar(255) NOT NULL,                   |
|        |   `txt` mediumtext NOT NULL,                             |
|        |   `title` varchar(255) DEFAULT NULL,                     |
|        |   `encoded` varchar(255) DEFAULT NULL,                   |
|        |   `description` varchar(255) NOT NULL,                   |
|        |   `module` varchar(255) NOT NULL,                        |
|        |   `oldtxt` mediumtext,                                   |
|        |   `createdAt` datetime NOT NULL,                         |
|        |   `updatedAt` datetime NOT NULL,                         |
|        |   PRIMARY KEY (`rid`),                                   |
|        |   UNIQUE KEY `id` (`id`),                                |
|        |   UNIQUE KEY `1_text_id_unique` (`id`)                   |
|        | ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 |
+--------+----------------------------------------------------------+
1 row in set  
Time: 0.010s  

此时我们再次更新数据库

mysql root@localhost:test_yl_audit> update `1_text` set txt = "123😀" where rid = 'xlqiwo001';  
(1366, u"Incorrect string value: '\\xF0\\x9F\\x98\\x80' for column 'txt' at row 1")

还是报错

理解一下这里面的问题,我们其实对于表的修改,如果当初建表的时候,设置了编码,字段默认编码是跟随表的;
数据库来说也是一样的

对于字段来说,还是一样的问题

mysql root@localhost:test_yl_audit> ALTER TABLE `1_text` change `txt` `txt` TEXT CHARACTER SET utf8mb4;  
Query OK, 1 row affected  
Time: 0.184s  
mysql root@localhost:test_yl_audit> show create table `1_text`;  
+--------+----------------------------------------------------------+
| Table  | Create Table                                             |
+--------+----------------------------------------------------------+
| 1_text | CREATE TABLE `1_text` (                                  |
|        |   `id` bigint(20) NOT NULL AUTO_INCREMENT,               |
|        |   `appid` varchar(255) NOT NULL,                         |
|        |   `rid` varchar(255) NOT NULL,                           |
|        |   `type` int(11) NOT NULL,                               |
|        |   `recheck` int(11) NOT NULL,                            |
|        |   `createopertime` int(11) NOT NULL,                     |
|        |   `lastopertime` int(11) NOT NULL,                       |
|        |   `extra` varchar(255) DEFAULT NULL,                     |
|        |   `uid` varchar(255) DEFAULT NULL,                       |
|        |   `cover_url` varchar(255) DEFAULT NULL,                 |
|        |   `uname` varchar(255) DEFAULT NULL,                     |
|        |   `status` int(11) NOT NULL,                             |
|        |   `callbackurl` varchar(255) NOT NULL,                   |
|        |   `txt` text,                                            |
|        |   `title` varchar(255) DEFAULT NULL,                     |
|        |   `encoded` varchar(255) DEFAULT NULL,                   |
|        |   `description` varchar(255) NOT NULL,                   |
|        |   `module` varchar(255) NOT NULL,                        |
|        |   `oldtxt` mediumtext,                                   |
|        |   `createdAt` datetime NOT NULL,                         |
|        |   `updatedAt` datetime NOT NULL,                         |
|        |   PRIMARY KEY (`rid`),                                   |
|        |   UNIQUE KEY `id` (`id`),                                |
|        |   UNIQUE KEY `1_text_id_unique` (`id`)                   |
|        | ) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8mb4 |
+--------+----------------------------------------------------------+
1 row in set  
Time: 0.022s  

其实上述的错误不是出在数据库这一层,而是终端的编码
终端(client)并不支持 utf8mb4,只支持utf8

// mycli
mysql root@localhost:test_yl_audit> show VARIABLES like '%character%'  
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8mb4                    |
| character_set_connection | utf8mb4                    |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8mb4                    |
| character_set_server     | utf8mb4                    |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set  
Time: 0.011s

// 看一下navicat 的 
character_set_client    utf8  
character_set_connection    utf8  
character_set_database    utf8  
character_set_filesystem    binary  
character_set_results    utf8  
character_set_server    utf8mb4  
character_set_system    utf8  
character_sets_dir    /usr/share/mysql/charsets/  

也就是说客户端不同,意味着客户端编码是有不同的

客户端编码

主要看一下

character_set_client    utf8  
character_set_connection    utf8  
character_set_results    utf8  

What character set is the statement in when it leaves the client?

The server takes the charactersetclient system variable to be the character set in which statements are sent by the client.

What character set should the server translate a statement to after receiving it?

For this, the server uses the charactersetconnection and collationconnection system variables. It converts statements sent by the client from charactersetclient to charactersetconnection, except for string literals that have an introducer (for example, _utf8mb4 or _latin2). collationconnection is important for comparisons of literal strings. For comparisons of strings with column values, collation_connection does not matter because columns have their own collation, which has a higher collation precedence.

What character set should the server translate to before shipping result sets or error messages back to the client?

The charactersetresults system variable indicates the character set in which the server returns query results to the client. This includes result data such as column values, and result metadata such as column names and error messages.

也就说

  1. MySQL Server收到请求时将请求数据从charactersetclient转换为charactersetconnection;
  2. 进行内部操作前将请求数据从charactersetconnection转换为内部操作字符集,其确定方法如下:
    • 使用每个数据字段的CHARACTER SET设定值;
    • 若上述值不存在,则使用对应数据表的DEFAULT CHARACTER SET设定值(MySQL扩展,非SQL标准);
    • 若上述值不存在,则使用对应数据库的DEFAULT CHARACTER SET设定值;
    • 若上述值不存在,则使用charactersetserver设定值。
  3. 将操作结果从内部操作字符集转换为charactersetresults。

网上误区

  1. show variabels like '%chara%' 其实查看的是当前client的"连接配置"

  2. 注意这里官方文档docs

set names utf8 的作用等价于

SET character_set_client = charset_name;  
SET character_set_results = charset_name;  
SET character_set_connection = charset_name;  

跟参考第一篇blog的三端一致,并不是完全一个意思
blog的三端 主要是指
client(connection潜意识认为是一样)、result、[server,dbcharacter、columncharacter、dbcharacter]

mysql root@localhost:test_yl_audit> status  
--------------
mycli 1.12.1, running on CPython 2.7.12

+----------------------+----------------------------------+
|                      |                                  |
+----------------------+----------------------------------+
| Connection id:       | 21                               |
| Current database:    | test_yl_audit                    |
| Current user:        | root@localhost                   |
| Current pager:       | System default                   |
| Server version:      | 5.7.19-0ubuntu0.16.04.1 (Ubuntu) |
| Protocol version:    | 10                               |
| Connection:          | localhost via TCP/IP             |
| Server characterset: | utf8mb4                          |
| Db characterset:     | utf8                             |
| Client characterset: | utf8mb4                          |
| Conn. characterset:  | utf8mb4                          |
| TCP port:            | 3306                             |
| Uptime:              | 2 hours 58 min 59 sec            |
+----------------------+----------------------------------+

Connections: 3  Queries: 95  Slow queries: 0  Opens: 146  Flush tables: 1  Open tables: 65  Queries per second avg: 0.000  
--------------
Time: 0.030s  

参考的几个中文blog
http://cenalulu.github.io/mysql/mysql-mojibake/
http://cenalulu.github.io/linux/character-encoding/
http://mysql.taobao.org/monthly/2015/05/07/