MySQL Backup mydumper-白红宇

MySQL Backup mydumper

阅读量：4565 次

发布时间：2019-06-08

本文共 15783 字，大约阅读时间需要 52 分钟。

原文:

正文

生产环境中有一实例每天使用mysqldump备份时长达到了2个小时53分钟，接近3个小时，还不算上备份文件归档的时间，这个时间对于逻辑备份来说有点久。为了提高逻辑备份效率，打算替换为使用mydumper。

对比mysqldump，mydumper具有如下特点：

多线程备份

备份执行速度更快

支持备份文件压缩

支持行级别切块备份

更多关于mydumper的说明，可以查看官方GitHub：

安装

之前在测试mydumper时有使用过早期版本，是通过编译进行安装的，而mydumper是C语言写的，编译过程中出现了一系列的依赖问题。为了避免出现安装依赖问题，官方从0.9.3版本开始提供了编译后的安装包，建议采用RPM包的方式进行安装。同样通过官方获取安装包。

本文使用的RPM安装包为：

mydumper-0.9.5-2.el7.x86_64.rpm

安装完成后验证：

# rpm -qa |grep mydumpermydumper-0.9.5-2.x86_64# rpm -ql mydumper-0.9.5-2.x86_64/usr/bin/mydumper/usr/bin/myloader

mydumper：用来备份数据。

myloader：用来还原数据。

本文主要讨论mydumper，查看版本信息：

# mydumper -Vmydumper 0.9.5, built against MySQL 5.7.21-21

主要选项

# mydumper --helpUsage:  mydumper [OPTION?] multi-threaded MySQL dumpingHelp Options:  -?, --help                  Show help optionsApplication Options:  -B, --database              Database to dump  -T, --tables-list           Comma delimited table list to dump (does not exclude regex option)  -O, --omit-from-file        File containing a list of database.table entries to skip, one per line (skips before applying regex option)  -o, --outputdir             Directory to output files to  -s, --statement-size        Attempted size of INSERT statement in bytes, default 1000000  -r, --rows                  Try to split tables into chunks of this many rows. This option turns off --chunk-filesize  -F, --chunk-filesize        Split tables into chunks of this output file size. This value is in MB  -c, --compress              Compress output files  -e, --build-empty-files     Build dump files even if no data available from table  -x, --regex                 Regular expression for 'db.table' matching  -i, --ignore-engines        Comma delimited list of storage engines to ignore  -N, --insert-ignore         Dump rows with INSERT IGNORE  -m, --no-schemas            Do not dump table schemas with the data  -d, --no-data               Do not dump table data  -G, --triggers              Dump triggers  -E, --events                Dump events  -R, --routines              Dump stored procedures and functions  -W, --no-views              Do not dump VIEWs  -k, --no-locks              Do not execute the temporary shared read lock.  WARNING: This will cause inconsistent backups  --no-backup-locks           Do not use Percona backup locks  --less-locking              Minimize locking time on InnoDB tables.  -l, --long-query-guard      Set long query timer in seconds, default 60  -K, --kill-long-queries     Kill long running queries (instead of aborting)  -D, --daemon                Enable daemon mode  -I, --snapshot-interval     Interval between each dump snapshot (in minutes), requires --daemon, default 60  -L, --logfile               Log file name to use, by default stdout is used  --tz-utc                    SET TIME_ZONE='+00:00' at top of dump to allow dumping of TIMESTAMP data when a server has data in different time zones or data is being moved between servers with different time zones, defaults to on use --skip-tz-utc to disable.  --skip-tz-utc  --use-savepoints            Use savepoints to reduce metadata locking issues, needs SUPER privilege  --success-on-1146           Not increment error count and Warning instead of Critical in case of table doesn't exist  --lock-all-tables           Use LOCK TABLE for all, instead of FTWRL  -U, --updated-since         Use Update_time to dump only tables updated in the last U days  --trx-consistency-only      Transactional consistency only  --complete-insert           Use complete INSERT statements that include column names  -h, --host                  The host to connect to  -u, --user                  Username with the necessary privileges  -p, --password              User password  -a, --ask-password          Prompt For User password  -P, --port                  TCP/IP port to connect to  -S, --socket                UNIX domain socket file to use for connection  -t, --threads               Number of threads to use, default 4  -C, --compress-protocol     Use compression on the MySQL connection  -V, --version               Show the program version and exit  -v, --verbose               Verbosity of output, 0 = silent, 1 = errors, 2 = warnings, 3 = info, default 2  --defaults-file             Use a specific defaults file  --ssl                       Connect using SSL  --key                       The path name to the key file  --cert                      The path name to the certificate file  --ca                        The path name to the certificate authority file  --capath                    The path name to a directory that contains trusted SSL CA certificates in PEM format  --cipher                    A list of permissible ciphers to use for SSL encryption

-B, --database
指定dump数据库

-T, --tables-list
指定dump表，多个表用逗号分隔(不排除正则匹配)

-O, --omit-from-file
指定dump需要跳过包含一行或多行database.table格式的文件，跳过dump的优先级大于dump正则匹配

-o, --outputdir
指定dump文件保存目录

-s, --statement-size
指定dump生成insert语句的大小，单位字节，默认是1000000

-r, --rows
把表多少行分割成chunks，这个选项会关闭选项 --chunk-filesize

-F, --chunk-filesize
表分割成chunks的大小，单位为MB，这个指定大小默认为加1MB，如果想切割成每个3MB大小的文件，则指定 -F 2，如果指定 -F 1，则不进行切割，不清楚为什么这么设置

-c, --compress
压缩输出文件

-e, --build-empty-files
表中如果没有数据也创建dump文件

-x, --regex
正则匹配，如'db.table'

-i, --ignore-engines
忽略存储引擎，如有多个用逗号分隔

-N, --insert-ignore
dump文件中不使用INSERT语句

-m, --no-schemas
dump文件中只有表数据而没有表结构信息

-d, --no-data
dump文件中只有表结构而没有表数据

-G, --triggers
dump触发器

-E, --events
dump事件

-R, --routines
dump存储过程和函数

-W, --no-views
不要dump视图

-k, --no-locks
不执行临时的共享读锁，这有可能会导致不一致的备份

--no-backup-locks
不使用Percona备份锁

--less-locking
最小化对InnoDB表的锁定时间

-l, --long-query-guard
设置长查询的时间, 单位秒，默认60秒

-K, --kill-long-queries
Kill长时间执行的查询 (instead of aborting)

-D, --daemon
指定为守护进程模式

-I, --snapshot-interval
每次dump的快照间隔，单位分钟，需要开启 --daemon，默认60分钟

-L, --logfile
指定输出日志文件名，默认为屏幕标准输出

--tz-utc
在dump一开始加入时区timestamp，数据移动或恢复至不同时区上的数据库适用，默认通过选项 --skip-tz-utc 来禁用

--skip-tz-utc
用法如上

--use-savepoints
通过使用savepoints来避免元数据锁的产生，需要SUPER权限

--success-on-1146
不统计增量错误和警告，除非是表不存在的错误

--lock-all-tables
使用LOCK TABLE锁定所有表，代替FTWRL

-U, --updated-since
指定Update_time天数来dump只在过去几天内更新的表

--trx-consistency-only
事务一致性备份导出

--complete-insert
dump文件中包含完整的INSERT语句，语句中包含所有字段的名称

-h, --host
指定连接host

-u, --user
指定连接用户，需有相应的权限

-p, --password
指定用户密码

-a, --ask-password
指定用户密码提示输入

-P, --port
指定连接port

-S, --socket
指定本地socket文件连接

-t, --threads
指定dump线程数, 默认是4

-C, --compress-protocol
在mysql连接时使用压缩协议

-V, --version
显示程序版本并退出

-v, --verbose
显示更详细的输出, 0 = silent, 1 = errors, 2 = warnings, 3 = info, 默认是2

--defaults-file
指定默认参数文件

--ssl
使用SSL连接

--key
指定key file的文件路径

--cert
指定证书文件路径

--ca
指定证书授权文件路径

--capath
指定所有CA颁发的PEM格式文件路径

--cipher
指定允许使用SSL连接加密的密码列表

备份流程

测试MySQL版本为官方社区版5.7.24。

(root@localhost) [test] > select version();+------------+| version()  |+------------+| 5.7.24-log |+------------+1 row in set (0.00 sec)

通过开启mysql的general log观察下mydumper在备份过程中做了哪些操作。

开启general log

(root@localhost) [(none)] > show global variables like '%general%';+------------------+---------------------------------+| Variable_name    | Value                           |+------------------+---------------------------------+| general_log      | OFF                             || general_log_file | /data/mysql/3306/data/dbabd.log |+------------------+---------------------------------+2 rows in set (0.00 sec)(root@localhost) [(none)] > set global general_log = 1;Query OK, 0 rows affected (0.00 sec)(root@localhost) [(none)] > show global variables like '%general%';+------------------+---------------------------------+| Variable_name    | Value                           |+------------------+---------------------------------+| general_log      | ON                              || general_log_file | /data/mysql/3306/data/dbabd.log |+------------------+---------------------------------+2 rows in set (0.01 sec)

备份test库

# mydumper -h 192.168.58.3 -u root -a -P 3306 -B test -o /data/test/

备份文件结构(以test.t1表为例)：

# ll /data/test/total 66728-rw-r--r--. 1 root root      136 Dec 27 16:02 metadata-rw-r--r--. 1 root root       63 Dec 27 16:02 test-schema-create.sql-rw-r--r--. 1 root root      278 Dec 27 16:02 test.t1-schema.sql-rw-r--r--. 1 root root 18390048 Dec 27 16:02 test.t1.sql

通过以上信息可知，备份所有文件都存放在一个目录当中，可以指定。如果没有指定路径，则在运行mydumper命令的当前目录下，生成一个新的目录，名称命名规则为：export-yyyymmdd-HHMMSS 。每个备份目录中主要产生的备份文件为：

metadata文件

metadata：备份元数据信息。包含备份开始和备份结束时间，以及MASTER LOG FILE和MASTER LOG POS。如果是在从库进行备份，则记录的是 SHOW SLAVE STATUS 中同步到的主库binlog文件及binlog位置。

# cat metadataStarted dump at: 2018-12-27 16:02:06SHOW MASTER STATUS:        Log: mysql-bin.000034        Pos: 154        GTID:Finished dump at: 2018-12-27 16:02:35

库创建语句文件

test-schema-create.sql：test库的创建语句。

# cat test-schema-create.sqlCREATE DATABASE `test` /*!40100 DEFAULT CHARACTER SET utf8 */;

每张表两个备份文件

test.t1-schema.sql：t1表的创建语句。

test.t1.sql：t1表数据文件，以INSERT语句存储。

如果涉及到大表进行表切片备份的话，会有多个表数据文件。

查看general log

-- 主线程连接数据库，设置临时session级别参数 7   Connect   admin@dbabd on test using TCP/IP 7   Query     SET SESSION wait_timeout = 2147483 7   Query     SET SESSION net_write_timeout = 2147483 7   Query     SHOW PROCESSLIST-- 主线程执行FTWRL获取全局读锁，并开启一致性快照事务，记录当前binlog文件及位置 7   Query     FLUSH TABLES WITH READ LOCK 7   Query     START TRANSACTION /*!40108 WITH CONSISTENT SNAPSHOT */ 7   Query     /*!40101 SET NAMES binary*/ 7   Query     SHOW MASTER STATUS 7   Query     SHOW SLAVE STATUS-- 产生了4个子进程，并且设置会话级事务隔离级别为REPEATABLE READ，4个子线程同时进行dump操作 8   Connect   admin@dbabd on  using TCP/IP 8   Query     SET SESSION wait_timeout = 2147483 8   Query     SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 8   Query     START TRANSACTION /*!40108 WITH CONSISTENT SNAPSHOT */ 8   Query     /*!40103 SET TIME_ZONE='+00:00' */ 8   Query     /*!40101 SET NAMES binary*/ 9   Connect   admin@dbabd on  using TCP/IP 9   Query     SET SESSION wait_timeout = 2147483 9   Query     SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ 9   Query     START TRANSACTION /*!40108 WITH CONSISTENT SNAPSHOT */ 9   Query     /*!40103 SET TIME_ZONE='+00:00' */ 9   Query     /*!40101 SET NAMES binary*/10   Connect   admin@dbabd on  using TCP/IP10   Query     SET SESSION wait_timeout = 214748310   Query     SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ10   Query     START TRANSACTION /*!40108 WITH CONSISTENT SNAPSHOT */10   Query     /*!40103 SET TIME_ZONE='+00:00' */10   Query     /*!40101 SET NAMES binary*/11   Connect   admin@dbabd on  using TCP/IP11   Query     SET SESSION wait_timeout = 214748311   Query     SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ11   Query     START TRANSACTION /*!40108 WITH CONSISTENT SNAPSHOT */11   Query     /*!40103 SET TIME_ZONE='+00:00' */11   Query     /*!40101 SET NAMES binary*/-- 主线程获取备份库语句和表状态  7   Init DB   test 7   Query     SHOW TABLE STATUS 7   Query     SHOW CREATE DATABASE `test`-- 4个子进程备份库中所有的表 8   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='course' and extra like '%GENERATED%'11   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='t' and extra like '%GENERATED%'10   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='t2' and extra like '%GENERATED%' 9   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='t1' and extra like '%GENERATED%' 8   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`course` 7   Query     UNLOCK TABLES /* FTWRL */ 7   Quit11   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`t` 9   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`t1`10   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`t2`11   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='t3' and extra like '%GENERATED%'11   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`t3`11   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='teacher' and extra like '%GENERATED%'11   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`teacher` 8   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='teachercard' and extra like '%GENERATED%' 8   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`teachercard` 8   Query     SHOW CREATE TABLE `test`.`course` 8   Query     SHOW CREATE TABLE `test`.`t` 8   Query     SHOW CREATE TABLE `test`.`t1` 8   Query     SHOW CREATE TABLE `test`.`t2` 8   Query     SHOW CREATE TABLE `test`.`t3` 8   Query     SHOW CREATE TABLE `test`.`teacher` 8   Query     SHOW CREATE TABLE `test`.`teachercard` 8   Query     SHOW CREATE TABLE `test`.`v9_pic_tag_content`11   Query     select COLUMN_NAME from information_schema.COLUMNS where TABLE_SCHEMA='test' and TABLE_NAME='v9_pic_tag_content' and extra like '%GENERATED%' 8   Quit11   Query     SELECT /*!40001 SQL_NO_CACHE */ * FROM `test`.`v9_pic_tag_content` 9   Quit10   Quit11   Quit

总结下mydumper的工作流程：

主线程连接MySQL，查询当前服务线程状态确定是否中止dump或是KILL长查询；

通过FTWRL获取全局读锁，确保dump一致性，开启一致性快照事务，查询当前binlog信息写入metadata文件；

创建多个子线程(默认4个)，开启一致性快照事务，将session级事务隔离级别设置成REPEATABLE READ；

子线程备份非事务引擎表(non-InnoDB tables)；

待子线程备份完非事务引擎表后，主线程执行UNLOCK TABLES释放全局读锁；

子线程备份事务引擎表(InnoDB tables)；

(如有)子线程备份函数、存储过程、触发器和视图；

dump过程结束。

用法示例

备份全库

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -o /data/backupdir

备份某个库

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -o /data/test/

备份多个库(可使用正则匹配)

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -x '^(test\.|test2\.)' -o /data/

不备份某(几)个库

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -x '^(?!(mysql\.|sys\.))' -o /data/

备份某(几)张表

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -T test.t1,test2.t3 -o /data/

不备份某(几)张表

通过正则匹配

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -x '^(?!test.t2)' -o /data/test/

通过选项 -O, --omit-from-file

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -O nodump.file -o /data/test/

切割表数据文件，指定每份文件包含行数

test.t2表有100万行：

(root@localhost) [test] > select count(*) from t2;+----------+| count(*) |+----------+|  1000000 |+----------+1 row in set (0.45 sec)

现在指定备份test.t2表分割成每个chunks包含的行数为10万行：

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -T test.t2 -r 100000 -o /data/test/

查看表备份文件：

# ls /data/test/metadata           test.t2.00001.sql  test.t2.00003.sql  test.t2.00005.sql  test.t2.00007.sql  test.t2.00009.sqltest.t2.00000.sql  test.t2.00002.sql  test.t2.00004.sql  test.t2.00006.sql  test.t2.00008.sql  test.t2-schema.sql

切割表数据文件，指定每份文件大小

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -T test.t2 -F 2 -o /data/test/

查看表备份文件：

# ll -h /data/test/total 18M-rw-r--r--. 1 root root  141 Dec 27 16:32 metadata-rw-r--r--. 1 root root 2.9M Dec 27 16:32 test.t2.00001.sql-rw-r--r--. 1 root root 2.9M Dec 27 16:32 test.t2.00002.sql-rw-r--r--. 1 root root 2.9M Dec 27 16:32 test.t2.00003.sql-rw-r--r--. 1 root root 2.9M Dec 27 16:32 test.t2.00004.sql-rw-r--r--. 1 root root 2.9M Dec 27 16:32 test.t2.00005.sql-rw-r--r--. 1 root root 2.9M Dec 27 16:32 test.t2.00006.sql-rw-r--r--. 1 root root 381K Dec 27 16:32 test.t2.00007.sql-rw-r--r--. 1 root root  278 Dec 27 16:32 test.t2-schema.sql

对备份文件进行压缩

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -c -o /data/test/

没压缩之前的大小：

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -o /data/test/# du /data/test  --max-depth=1 -h53M     /data/test

压缩之后的大小：

# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -c -o /data/test/# du /data/test/  --max-depth=1 -h22M     /data/test/

对空表备份也生成数据文件
```
# mydumper -h 192.168.58.3 -u admin -a -P 3306 -B test -e -o /data/test/
```
这样即使是张空表，不仅备份会生成table-schema.sql文件，也会生成table.sql文件。