Monday, July 21, 2008

Some quirks of circular and row-based replication

One of the new features introduced by MySQL 5.1 is row-based replication.

Unlike the classic statement-based replication, used in MySQL up to version 5.0, row-based replication transfers the data instead of the statement used to create it.

If you want to have a taste of row-based replication, you can do some experiments with MySQL Sandbox.

First, we create a sandbox of circular replication with MySQL 5.0

./make_replication_sandbox --topology=circular --how_many_nodes=3 \
/path/to/mysql-5.0.51a-YOUR_OS.tar.gz

cd $HOME/sandboxes/rcsandbox_5.0.51
./n1 -e "create table test.t1(i int)"
./n3 -e "insert into test.t1 values (@@server_id)"
./use_all "select * from test.t1"
# server: 1:
i
101
# server: 2:
i
102
# server: 3:
i
103

This is statement-based replication at its best.

A look at the binary log confirms that this is the case.

 $ ./node3/my sqlbinlog node3/data/mysql-bin.000001 |tail
#080720 8:57:59 server id 103 end_log_pos 1958 Query thread_id=8 exec_time=0 error_code=0
SET TIMESTAMP=1216569479/*!*/;
SET @@session.auto_increment_increment=3, @@session.auto_increment_offset=3/*!*/;
insert into test.t1 values (@@server_id)/*!*/;

It may be desirable to preserve the different variables in every slave, and I know of a few schemes where it is useful, but in most cases a user wants the data on master and slaves to be always the same (think of backup slaves, for instances), and therefore the statement-based replication should be considered faulty in this respect.

Let's do the same experiment with the latest 5.1

./make_replication_sandbox --topology=circular --how_many_nodes=3 \
/path/to/mysql-5.1.26-rc-YOUR_OS.tar.gz

cd $HOME/sandboxes/rcsandbox_5.0.51
./n1 -e "create table test.t1(i int)"
./n3 -e "insert into test.t1 values (@@server_id)"
./use_all "select * from test.t1"
# server: 1:
i
103
# server: 2:
i
103
# server: 3:
i
103

Now, that's different. We can see row-based replication in action. And we can see that the binary log looks a lot different.

 $ ./node3/my sqlbinlog node3/data/mysql-bin.000001 |tail
#080720 9:05:00 server id 103 end_log_pos 3247 Query thread_id=25 exec_time=0 error_code=0
SET TIMESTAMP=1216569900/*!*/;
SET @@session.auto_increment_increment=3, @@session.auto_increment_offset=3/*!*/;
BEGIN
/*!*/;
# at 3247
# at 3288
#080720 9:05:00 server id 103 end_log_pos 3288 Table_map: `test`.`t1` mapped to number 19
#080720 9:05:00 server id 103 end_log_pos 3322 Write_rows: table id 19 flags: STMT_END_F

BINLOG '
LGKDSBNnAAAAKQAAANgMAAAAABMAAAAAAAAABHRlc3QAAnQxAAEDAAE=
LGKDSBdnAAAAIgAAAPoMAAAQABMAAAAAAAEAAf/+ZwAAAA==
'/*!*/;
# at 3322
#080720 9:05:00 server id 103 end_log_pos 3392 Query thread_id=25 exec_time=0 error_code=0
SET TIMESTAMP=1216569900/*!*/;
COMMIT

So far, all is as expected (if you have been keeping up with the documentation, of course). The default binlog_format in MySQL 5.1 is mixed, meaning that the server will use statement-based as much as possible, switching to row-based when there is the risk of incorrect replication, like in this case.

Users can change binlog_format at will, thus influencing how replication is performed. However, in the case of circular replication, you must be extra careful, because you don't have control on how the other nodes are behaving.

For example.

$ ./n1 -e "truncate test.t1"
$ ./n3 -e "set binlog_format='statement';insert into test.t1 values (@@server_id)"
$ ./use_all "select * from test.t1"
# server: 1:
i
101
# server: 2:
i
101
# server: 3:
i
103

WTF? Why server 1 and server 3 have inserted their corresponding server ids, and server 2 has not?

Let's do some debugging.
  1. Server 3 inserts @@server_id with statement-based replication, overriding the defaults.
  2. Server 3's slave, server 1, executes the statement. But its default binlog_format is still "row", not "statement".
  3. Server 2, slave of server 1, receives a row-based chunk from the binary log, and therefore inserts the value of its master, because the statement was not passed along.
Using row-based format explicitly would not cause any side effect.

$ ./n1 -e "truncate test.t1"
$ ./n3 -e "set binlog_format='row';insert into test.t1 values (@@server_id)"
$ ./use_all "select * from test.t1"
# server: 1:
i
103
# server: 2:
i
103
# server: 3:
i
103
Looking for really useful uses of row-based replication, we can experience the greatest advantage (compared to statement-based) when inserting the result of an expensive calculation. Let's try. First of all, we force row-based replication on all nodes permanently.

$ ./use_all 'set global binlog_format="row"'
Then we execute an expensive query, and see the results.
$ ./n3 -e "insert into test.t1 select count(*) from information_schema.columns"
$ ./use_all "show global status like 'opened_tables' "
# server: 1:
Variable_name Value
Opened_tables 17
# server: 2:
Variable_name Value
Opened_tables 17
# server: 3:
Variable_name Value
Opened_tables 41
If you know how information_schema works in MySQL, you know that if you want to get statistics on the number of columns, the server has to open all the tables and count them. Server 3 has done exactly that, and the number of opened tables is more than double than its fellows servers.

This is a useful operation to perform when you know that the master is inserting the result of an expensive calculation, and you don't want to delay the slaves.

Switching to row-based replication permanently would be a mistake, though. Consider a simple update like this one:

UPDATE huge_table set some_column = 'some value' where some_other_column = 'X';

This would be relatively inexpensive in statement-based replication. If the table has 200 columns, though, using row-based format your binary log will pass the whole record to the slaves.
Used with care, row-based replication can deliver good results.

No comments: