Skip to content

Benchmark and optimize gtid#25

Open
grodowski wants to merge 7 commits intomasterfrom
benchmark-gtid
Open

Benchmark and optimize gtid#25
grodowski wants to merge 7 commits intomasterfrom
benchmark-gtid

Conversation

@grodowski
Copy link
Copy Markdown
Member

@grodowski grodowski commented Mar 4, 2026

Reduce allocations in GTID binlog streaming

Previously, every GTIDEvent cloned the full GTID set to build pending coordinates, and every row event cloned again via GetCurrentBinlogCoordinates. With large GTID sets (100+ server UUIDs) this adds up across the life of a migration.

Changes

Deferred + cached materialization.
GTIDBinlogCoordinates now has a pending mode. On a GTIDEvent we record (baseSet, sid, gno) without cloning — the full materialized set is only computed when needed for a comparison or string representation. The result is cached via sync.Once so repeated calls on the same coordinate are free.

Encapsulation.
GTIDSet is unexported with a GTIDSet() getter and NewGTIDBinlogCoordinatesFromSet constructor. This enforces the aliasing contract in WithPendingGTID — external code cannot reassign the base set after it has been aliased by a pending child.

Benchmark

Apple M3 Pro, 182 UUIDs, benchtime=5s -count=3

ns/op B/op allocs/op
master ~417k ~525k ~6735
this PR ~524k ~187k ~3074
delta +26% -64% -54%

Pre-existing issue (out of scope)

ConnectBinlogStreamer passes coords.GTIDSet() directly to StartSyncGTID. go-mysql stores this as prevGset and calls AddGTID on it on every GTIDEvent, silently mutating initialBinlogCoordinates in place. This predates this PR and is a candidate for a follow-up fix (coords.GTIDSet().Clone() at the call site).


Full benchmark output before:

❯ go test ./go/binlog/ -run='^$' -bench=BenchmarkStreaming -benchmem -benchtime=5s -v
goos: darwin
goarch: arm64
pkg: github.com/github/gh-ost/go/binlog
cpu: Apple M3 Pro
BenchmarkStreamingGTID
GTID (182 UUIDs)               done (1 iters)
GTID (182 UUIDs)               done (100 iters)
GTID (182 UUIDs)               done (10000 iters)
GTID (182 UUIDs)               done (13946 iters)
BenchmarkStreamingGTID-11    	   13946	    433638 ns/op	  567269 B/op	    7360 allocs/op
BenchmarkStreamingFile
File                           done (1 iters)
File                           done (100 iters)
File                           done (10000 iters)
File                           done (13101 iters)
BenchmarkStreamingFile-11    	   13101	    455612 ns/op	  160392 B/op	    2734 allocs/op
PASS
ok  	github.com/github/gh-ost/go/binlog	21.718s

After:

❯ go test ./go/binlog/ -run='^$' -bench=BenchmarkStreaming -benchmem -benchtime=5s -v
goos: darwin
goarch: arm64
pkg: github.com/github/gh-ost/go/binlog
cpu: Apple M3 Pro
BenchmarkStreamingGTID
GTID (182 UUIDs)               done (1 iters)
GTID (182 UUIDs)               done (100 iters)
GTID (182 UUIDs)               done (10000 iters)
BenchmarkStreamingGTID-11    	   10000	    549252 ns/op	  188160 B/op	    3261 allocs/op
BenchmarkStreamingFile
File                           done (1 iters)
File                           done (100 iters)
File                           done (10000 iters)
File                           done (13216 iters)
BenchmarkStreamingFile-11    	   13216	    456533 ns/op	  159648 B/op	    2707 allocs/op
PASS
ok  	github.com/github/gh-ost/go/binlog	16.880s

return fmt.Errorf("Unknown DML type: %v", ev.Header.EventType)
}

// Convert schema and table names once per RowsEvent, not per row
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe we don't need this specific comment

// WithPendingGTID returns coordinates for a transaction that has been announced
// (via GTIDEvent) but not yet committed. g.GTIDSet is aliased directly as the base
// without cloning; the Clone is deferred until the coordinates are actually compared
// or stringified. g must not be mutated after this call.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can only truly guarantee g must not be mutated after this call by using a setter/getter with a fuse

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good idea, I think we can make GTIDSet private and expose it through a constructor and getter. Not sure if a setter will be needed though 🤔

// lastCommittedCoords. lastCommittedCoords is subsequently used as the
// base inside WithPendingGTID: it is cloned there only if a comparison
// or string representation is actually requested, and never mutated.
// Any future code that modifies the set after this point must Clone first.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any future code that modifies the set after this point must Clone first. I don't really like this being available for potential accidents in the future

// committed transaction; pendingSID:pendingGNO is the announced-but-not-yet-
// committed GTID. The expensive Clone is deferred until resolvedGTIDSet is called,
// which only happens when comparisons or string representations are needed — not on
// every row event in the hot path.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice

if g.pendingGNO != 0 {
set := g.GTIDSet.Clone().(*gomysql.MysqlGTIDSet)
set.AddGTID(g.pendingSID, g.pendingGNO)
return set
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this routine is still being called many times should we look at caching the result of this if it's pending?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can wrap it in once 👍

@grodowski grodowski changed the title [dnm] benchmark and optimize gtid Benchmark and optimize gtid Apr 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants