PT-2534 - fixed connection leak due to confusion on parent flag#1123
PT-2534 - fixed connection leak due to confusion on parent flag#1123marcelohpf wants to merge 2 commits into
Conversation
EvgeniyPatlan
left a comment
There was a problem hiding this comment.
I would suggest some changes. They are for test scenario but still it is netter to follow perl recommendations
34e5001 to
d9a31b9
Compare
Introduces a new argument to Cxn to flag that connection should not be closed automatically with InactiveDestroy. The code had two different purposes to the same argument - replication topology parent and processes parent in fork scenarios leading to a leak.
d9a31b9 to
5a9d429
Compare
|
I had to update the commit author due to the CLA agreement, sorry for the force push; |
|
thanks you @marcelohpf for the contribution and for the quick fixes! |
|
I'm pretty sure this is the issue I'm currently running into with the latest version (3.7.1). While pt-osc is running, it's creating a ton of connections on the replica to check its lag (like 1000+). Eventually, it hits
|
Introduces a new argument to Cxn to flag that connection should not be closed automatically with InactiveDestroy, while keep the
parentargument for its current usage of topology parent (or connection source parent)The code had two different purposes to the same argument "parent" - topology parent and process parent in fork scenarios leading to a leak when get_replicas was called multiple times (e.g. when waiting for replication delays).
None of the current percona toolkit bin/ scripts seems to use forking other than deadlock detection. None of the "parent => X" cases were properly used in a multiprocess scenario, even when daemonizing the tool, connections were created after the fork;
You can reproduce it by:
Then connect to one of the read-replicas and watch process list grow indefinitely:
The root cause of the issue is that InactiveDestroy prevent dbh closing connection as part of the DESTROY workflow that is triggered when object looses reference count.