MySQL replication woes
10 March 2005 15:48
I've been fighting with a MySQL replication pair for a shared-hosting environment (multiple users each with their own database on the machine). The problem is that the replication keeps stopping because of regular old user errors such as trying to create a table that already exists, insert statements with fields that don't exist and so forth. So a user tries to create a table that already exists on the master, the erroneous statement is propagated over to the slave which throws its hands up and refuses to do any further replication until an admin comes by to manually fix the problem. Uggh.
So I run on over to mysql.com and consult the docs to see if this is "normal" behavior. Looking at the docs on replication "features and known problems" (http://dev.mysql.com/doc/mysql/en/replication-features.html) I find the following:
"If a statement on the slave produces an error, the slave SQL thread terminates, and the slave writes a message to its error log. You should then connect to the slave manually, fix the problem (for example, a non-existent table), and then run START SLAVE."
One of the things that I really dislike about MySQL is how they tout these cool features, but once you get into the nitty-gritty, you always seem to come up with "features" like these that leave a sour taste in your mouth. Want to use their clustered database? Now we can compete with Oracle's Real Application Clusters, they say. Oh, but you have to store your entire database in memory across the cluster. Sour. To be fair I hear they're working on a system where such is not a requirement, but it's still frustrating to get excited about such a system, only to run into these limitations.
My guess is that in the case of replication this is a result of MySQL's replication being logical replication (i.e., just replaying a bunch of SQL statements on the slave) as opposed to a physical replication (keeping track of transactional changes, before and after data and applying those to the physical structures of the database. (I borrow the terms "logical" and "physical" in this instance from Oracle's hot-standby terminology). Working with Oracle databases in physical standby environments I have never run into a problem like this. It "just works". I wonder if this does turn out to be a problem in Oracle's logical standby mode, however. Knowing Oracle's flexibility, I'd wager you can configure whether or not to have the replication die on such a situation.
Bad MySQL, no biscuit :-(.
Update: After some more consultation with the docs it turns out you can use the "--slave-skip-errors= [err_code1,err_code2,... | all]" option to specify a list of errors the slave should skip should it encounter them. So that fixes my problem temporarily, as the number of distinct errors I'm running into aren't that numerous. But in enabling this option, it leaves me with the nagging question in the back of my head, are my databases really in sync? I'll probably end up writing some scripts to crawl through the (hundreds of) databases in the master and slave instances to see if any serious 'diffs' exist. I wish MySQL had a physical replication mode (as opposed to a logical mode) so I wouldn't have to deal with this kind of thing.
On 26 April 2005 16:39 Atle Veka
On 05 August 2009 17:20 Dan