-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max-lag-millis parameter description is misleading when running on master #1387
Comments
Created this PR for updating the documentation - #1388 |
@dontstopbelieveing could you elaborate on this point? To my knowledge |
I'll add what we see in our test here, we land at the postpone cutover stage
And then once we delete the cutover flag
So far so good. At this point I would expect metadata locks to be released. But they don't get released and the log has these entries
This continued till we manually killed the gh-ost process. I am baffled about two issues -
The reason I said "rolling something back" is because the effect I see on MySQL is similar to if a long running transaction might do a rollback. This might not be a rollback but gh-ost running something else. |
This comes out of an issue we kept running into causing great pain and outages
We would run with
Then once row copy was complete, we would be in migrating stage for a long time applying binlogs. At this point heartbeat lag would be 10-30 seconds. We thought if we increased
max-lag-millis
from 1500 to 10000 this would give us less throttling and speed up binlog reading and applies (silly us!)Heartbeat lag would drop below 10 seconds, we would remove the cutover file and then run into
"ERROR Timeout while waiting for events up to lock"
which made sense since 10 seconds > cutover lock timeout of 6 seconds
Our ask is that we edit the documentation to point out this important effect of the seemingly innocent parameter as evident here https://github.com/github/gh-ost/blob/master/go/logic/migrator.go#L504
I also have 2 questions,
For context we are on AWS Aurora and the high hearbeat lag is a side effect of
aurora_binlog_replication_max_yield_seconds
set to non-zeroThe text was updated successfully, but these errors were encountered: