Conversation
go/binlog/gomysql_reader.go
Outdated
| return fmt.Errorf("Unknown DML type: %v", ev.Header.EventType) | ||
| } | ||
|
|
||
| // Convert schema and table names once per RowsEvent, not per row |
There was a problem hiding this comment.
maybe we don't need this specific comment
go/mysql/binlog_gtid.go
Outdated
| // WithPendingGTID returns coordinates for a transaction that has been announced | ||
| // (via GTIDEvent) but not yet committed. g.GTIDSet is aliased directly as the base | ||
| // without cloning; the Clone is deferred until the coordinates are actually compared | ||
| // or stringified. g must not be mutated after this call. |
There was a problem hiding this comment.
we can only truly guarantee g must not be mutated after this call by using a setter/getter with a fuse
There was a problem hiding this comment.
That's a good idea, I think we can make GTIDSet private and expose it through a constructor and getter. Not sure if a setter will be needed though 🤔
go/binlog/gomysql_reader.go
Outdated
| // lastCommittedCoords. lastCommittedCoords is subsequently used as the | ||
| // base inside WithPendingGTID: it is cloned there only if a comparison | ||
| // or string representation is actually requested, and never mutated. | ||
| // Any future code that modifies the set after this point must Clone first. |
There was a problem hiding this comment.
Any future code that modifies the set after this point must Clone first. I don't really like this being available for potential accidents in the future
go/mysql/binlog_gtid.go
Outdated
| // committed transaction; pendingSID:pendingGNO is the announced-but-not-yet- | ||
| // committed GTID. The expensive Clone is deferred until resolvedGTIDSet is called, | ||
| // which only happens when comparisons or string representations are needed — not on | ||
| // every row event in the hot path. |
go/mysql/binlog_gtid.go
Outdated
| if g.pendingGNO != 0 { | ||
| set := g.GTIDSet.Clone().(*gomysql.MysqlGTIDSet) | ||
| set.AddGTID(g.pendingSID, g.pendingGNO) | ||
| return set |
There was a problem hiding this comment.
since this routine is still being called many times should we look at caching the result of this if it's pending?
There was a problem hiding this comment.
We can wrap it in once 👍
1ce4642 to
361c913
Compare
Reduce allocations in GTID binlog streaming
Previously, every GTIDEvent cloned the full GTID set to build pending coordinates, and every row event cloned again via
GetCurrentBinlogCoordinates. With large GTID sets (100+ server UUIDs) this adds up across the life of a migration.Changes
Deferred + cached materialization.
GTIDBinlogCoordinatesnow has a pending mode. On a GTIDEvent we record(baseSet, sid, gno)without cloning — the full materialized set is only computed when needed for a comparison or string representation. The result is cached viasync.Onceso repeated calls on the same coordinate are free.Encapsulation.
GTIDSetis unexported with aGTIDSet()getter andNewGTIDBinlogCoordinatesFromSetconstructor. This enforces the aliasing contract inWithPendingGTID— external code cannot reassign the base set after it has been aliased by a pending child.Benchmark
Pre-existing issue (out of scope)
ConnectBinlogStreamerpassescoords.GTIDSet()directly toStartSyncGTID. go-mysql stores this asprevGsetand callsAddGTIDon it on every GTIDEvent, silently mutatinginitialBinlogCoordinatesin place. This predates this PR and is a candidate for a follow-up fix (coords.GTIDSet().Clone()at the call site).Full benchmark output before:
After: