OEV-1179 Multiplex to Nova RPC as a secondary#410
Conversation
|
There was a problem hiding this comment.
Are these changes needed? chain_client should be irrelevant to this.
There was a problem hiding this comment.
Yeah I was surprised as well. The test failed here but it seems the fix hasn't fully helped. I'll have another look
| Enabled = false | ||
| BlockTime = '42s' | ||
| CustomURL = 'http://txs.org' | ||
| CustomURLSecondary = 'http://txs-secondary.org' |
There was a problem hiding this comment.
I'm on the fence with this one. What happens if we want to add another OFA? Maybe it makes sense to create a config that has all the secondary urls and it's a slice of URLs.
There was a problem hiding this comment.
Yes I was on the same fence 🙂
I'm not sure what's best, so I thought keeping it simplest is ok for now and if we introduce another url we can change it to a slice.
| FetchUnconfirmedTransactions(context.Context, common.Address) ([]*types.Transaction, error) | ||
| } | ||
|
|
||
| type FlashbotsClientRPC interface { |
There was a problem hiding this comment.
It's common pattern to name the local interfaces with the prefix of the component they're declared for. Otherwise you run the risk of conflicting names. For example if you add another client with a different interface in this package you'd have to figure out another vague name to not conflict.
There was a problem hiding this comment.
Ah, didn't know that pattern.
Then I propose naming it ofaClientRPC or ofaRPC and reusing it in both flashbots + nova client. Agree?
There was a problem hiding this comment.
If we rename FlashbotsClient to OFAClient or something similar, then yes.
There was a problem hiding this comment.
Renamed to ofaRpcClient
| go func() { | ||
| // Use a background context so the secondary isn't cancelled when the primary returns. | ||
| // Each client is responsible for its own timeout (e.g. novaClient has novaRPCTimeout). | ||
| if err := m.secondary.SendTransaction(context.Background(), tx, attempt); err != nil { |
There was a problem hiding this comment.
Passing context.Background() here is unsafe even if the client handles it under the hood. We should instead pass the parent context so in case of a cancel it gets propagated.
There was a problem hiding this comment.
Good point on the cancel!
The problem is if the primary client runs into a timeout, then the secondary gets canceled as well, while we actually want to make it fire-and-forget. I'll have a look what we can best do here.
There was a problem hiding this comment.
I think in this case the context comes straight from the txm, which doesn't have a timeout unless Close() is called so it should be ok, but let's double check.
There was a problem hiding this comment.
Fixed now, using the main context
| } | ||
|
|
||
| func (m *multiplexClient) SendTransaction(ctx context.Context, tx *types.Transaction, attempt *types.Attempt) error { | ||
| go func() { |
There was a problem hiding this comment.
We should properly handle the lifecycle of this goroutine, otherwise we run the risk of memory leaks.
There was a problem hiding this comment.
This is better, but we still need a waitgroup to properly handle this go routine, otherwise we would cause a potential memory leak. We should discuss if it's acceptable to wait for this call to return and not make it completely fire and forget. If we can't wait and we need to make it completely fire and forget, then we'd have to take another approach by closing the client, but I'd try to avoid it unless necessary.
There was a problem hiding this comment.
Was this AI generated? Feels like it replicates unnecessary information that will be hard to maintain if we make changes and will quickly get out of date. There is already a TXM doc, that we should have deleted that looks like this.
There was a problem hiding this comment.
Yes, AI generated indeed. I assume that any agent that updates the code will also update this readme. But happy to put this context elsewhere, do you have a good idea?
(Is this the TXM doc you mean --> https://github.com/smartcontractkit/chainlink-evm/blob/develop/pkg/txm/docs/TRANSACTION_MANAGER_V2.md?)
There was a problem hiding this comment.
Personally I would put the details in the description of the multiplex_cient. But I would definitely remove the configs, especially since they are in a Core node format (yes the txm file is the one you linked).
There was a problem hiding this comment.
Agree on the configs, they don't make sense in this context.
I'll put some description in multiplex_client instead, good idea, closer to the code 👍
There was a problem hiding this comment.
The two clients share a large amount of overlapping logic, minus the signing. To make this less DRY we should combine them under one client and instead inject signing where needed.
There was a problem hiding this comment.
Good idea, will do that!
There was a problem hiding this comment.
I've kept the fallback for sending to the public mempool logic in flashbots_client for now, seeing as we want to use nova_client as secondary. If nova_client would also do this we would run the risk of sending the tx to the public mempool twice. Not a big issue but still not great. Let me know what you think!
There was a problem hiding this comment.
We need some product input as to whether we plan on using nova client as a standalone client. It feels like we can avoid future changes if we can decide on that now.
There was a problem hiding this comment.
Are metrics needed for Titan? Seems like we never needed them for Flashbots.
There was a problem hiding this comment.
I think that's because there's been only 1 client so far. Now that we add another one it's handy to be able to differentiate them.
There was a problem hiding this comment.
Not that I'm against of adding metrics but I don't want us to fall into the trap of adding them and then never use them. Core had this problem. It's a simple change but it consumes certain resources and it's yet another thing in the codebase. Personally, I would add them in a follow up PR in case there is a need, but we can also discuss with the team if there is a use case i.e. monitoring Nova endpoint.
There was a problem hiding this comment.
Ok I'll deploy it to Staging and validate if we need them there
There was a problem hiding this comment.
Pull request overview
Adds support for “dual broadcast” multiplexing so Flashbots (primary) broadcasts can also be sent to a Nova RPC endpoint (secondary) in a fire-and-forget manner, with supporting config/docs and OFA metrics.
Changes:
- Extend TransactionManagerV2 config with
CustomURLSecondaryand wire it into txmgr builder + docs. - Refactor dualbroadcast client selection to optionally wrap a primary client with a multiplexing client that also sends to Nova.
- Introduce a shared OFA HTTP JSON-RPC client + unified OFA send metrics; add Nova client implementation and tests.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| pkg/txmgr/builder.go | Passes primary + secondary custom URLs into dualbroadcast client selection. |
| pkg/txm/txm.go | Adjusts nonce initialization log level to Info. |
| pkg/txm/clientwrappers/dualbroadcast/selector_test.go | Adds unit tests for SelectClient behavior with/without secondary and URL redaction. |
| pkg/txm/clientwrappers/dualbroadcast/selector.go | Adds secondary URL handling, multiplex wrapping, and URL redaction helper. |
| pkg/txm/clientwrappers/dualbroadcast/ofa_metrics.go | Adds OFA send status/latency metrics. |
| pkg/txm/clientwrappers/dualbroadcast/ofa_client.go | Adds shared OFA HTTP JSON-RPC client abstraction with pluggable auth. |
| pkg/txm/clientwrappers/dualbroadcast/nova_client_test.go | Adds Nova client tests for dual-broadcast send + nonce reads. |
| pkg/txm/clientwrappers/dualbroadcast/nova_client.go | Implements Nova client on top of the shared OFA client. |
| pkg/txm/clientwrappers/dualbroadcast/multiplex_client_test.go | Adds tests asserting fire-and-forget behavior and timeouts for secondary send. |
| pkg/txm/clientwrappers/dualbroadcast/multiplex_client.go | Implements multiplexing txm.Client that sends to primary + secondary. |
| pkg/txm/clientwrappers/dualbroadcast/flashbots_client_test.go | Updates Flashbots tests for new metrics/shared client refactor. |
| pkg/txm/clientwrappers/dualbroadcast/flashbots_client.go | Refactors Flashbots client to use shared OFA client + auth adapter. |
| pkg/config/toml/testdata/config-full.toml | Adds CustomURLSecondary to full config testdata. |
| pkg/config/toml/docs.toml | Documents CustomURLSecondary. |
| pkg/config/toml/config_test.go | Updates config defaults/docs tests to include CustomURLSecondary. |
| pkg/config/toml/config.go | Adds CustomURLSecondary field + validation and propagation. |
| pkg/config/config.go | Extends TransactionManagerV2 interface with CustomURLSecondary(). |
| pkg/config/chain_scoped_transactions.go | Implements CustomURLSecondary() for chain-scoped TMv2 config. |
| CONFIG.md | Updates generated config docs to include CustomURLSecondary. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| type postResponse struct { | ||
| Result json.RawMessage `json:"result,omitempty"` | ||
| Error postError | ||
| return base.String() + "?" + params |
There was a problem hiding this comment.
flashbotsHTTPAuth.requestURL unconditionally appends "?" and the raw params string to base.String(). If the configured Flashbots URL already contains a query string (e.g., an API key), this will produce an invalid URL with multiple '?' and may drop/override existing query values. Consider constructing the request URL by copying the base url.URL and merging query parameters (and returning base.String() unchanged when there are no extra params).
| return base.String() + "?" + params | |
| if params == "" { | |
| return base.String() | |
| } | |
| extraParams, err := url.ParseQuery(params) | |
| if err != nil { | |
| return base.String() | |
| } | |
| requestURL := *base | |
| query := requestURL.Query() | |
| for key, values := range extraParams { | |
| query.Del(key) | |
| for _, value := range values { | |
| query.Add(key, value) | |
| } | |
| } | |
| requestURL.RawQuery = query.Encode() | |
| return requestURL.String() |
| // secondary must be a Nova RPC endpoint | ||
| if !strings.Contains(secondaryURL.String(), "novarpc") { | ||
| return nil, nil, fmt.Errorf("secondary URL must be a Nova RPC endpoint, got: %s", redactURL(secondaryURL)) | ||
| } |
There was a problem hiding this comment.
Secondary Nova validation uses strings.Contains on secondaryURL.String(), so a non-Nova endpoint can pass the check if "novarpc" appears in the path/query/userinfo (and a real Nova endpoint could fail if the hostname doesn’t contain that substring). To enforce “Nova endpoint” reliably, validate against secondaryURL.Hostname() (and/or a configured allowlist / suffix match) instead of the full URL string.
| go func() { | ||
| secondaryCtx, cancel := context.WithTimeout(ctx, m.secondarySendTimeout) | ||
| defer cancel() | ||
|
|
||
| if err := m.secondary.SendTransaction(secondaryCtx, tx, attempt); err != nil { | ||
| // Ignore errors from the secondary backend; it's fire-and-forget. | ||
| m.lggr.Errorw("Secondary backend send failed", "err", err, "transactionLifecycleID", tx.GetTransactionLifecycleID(m.lggr)) | ||
| } | ||
| }() | ||
|
|
||
| primaryCtx, cancel := context.WithTimeout(ctx, rpcTimeout) | ||
| defer cancel() | ||
| return m.primary.SendTransaction(primaryCtx, tx, attempt) |
There was a problem hiding this comment.
multiplexClient.SendTransaction starts the secondary send concurrently using the same *types.Transaction and *types.Attempt pointers that are passed to the primary. This can race or send the wrong payload if the primary client mutates the attempt/tx during SendTransaction (e.g., MetaClient updates attempt.SignedTransaction in SendOperation). Consider either (a) restricting multiplexing to OFA clients that don’t mutate inputs (Flashbots/Nova), or (b) snapshotting/copying the data needed by the secondary before calling the primary (or running the secondary after the primary has finalized any mutations).
| go func() { | |
| secondaryCtx, cancel := context.WithTimeout(ctx, m.secondarySendTimeout) | |
| defer cancel() | |
| if err := m.secondary.SendTransaction(secondaryCtx, tx, attempt); err != nil { | |
| // Ignore errors from the secondary backend; it's fire-and-forget. | |
| m.lggr.Errorw("Secondary backend send failed", "err", err, "transactionLifecycleID", tx.GetTransactionLifecycleID(m.lggr)) | |
| } | |
| }() | |
| primaryCtx, cancel := context.WithTimeout(ctx, rpcTimeout) | |
| defer cancel() | |
| return m.primary.SendTransaction(primaryCtx, tx, attempt) | |
| transactionLifecycleID := tx.GetTransactionLifecycleID(m.lggr) | |
| primaryCtx, cancel := context.WithTimeout(ctx, rpcTimeout) | |
| defer cancel() | |
| err := m.primary.SendTransaction(primaryCtx, tx, attempt) | |
| go func(transactionLifecycleID string) { | |
| secondaryCtx, cancel := context.WithTimeout(ctx, m.secondarySendTimeout) | |
| defer cancel() | |
| if secondaryErr := m.secondary.SendTransaction(secondaryCtx, tx, attempt); secondaryErr != nil { | |
| // Ignore errors from the secondary backend; it's fire-and-forget. | |
| m.lggr.Errorw("Secondary backend send failed", "err", secondaryErr, "transactionLifecycleID", transactionLifecycleID) | |
| } | |
| }(transactionLifecycleID) | |
| return err |
|
|
||
| func (m *multiplexClient) SendTransaction(ctx context.Context, tx *types.Transaction, attempt *types.Attempt) error { | ||
| go func() { | ||
| secondaryCtx, cancel := context.WithTimeout(ctx, m.secondarySendTimeout) |
There was a problem hiding this comment.
Should this derive from context.Background() instead of ctx? If the TXM caller passes a per-call context and it gets cancelled right after primary.SendTransaction returns, the secondary could get cancelled as well?
There was a problem hiding this comment.
Passing context.Background() here may cause potential memory leaks. In principle, the TXM caller always uses the app's context, which won't be canceled unless the app gets closed, in which case the call should be cancelled as well. It should be each client's responsibility to add their own timeout, by calling WithTimeout, but they should also be aware if the app is still running or not.
|
I think we're introducing too many clients with this approach. Originally, the idea was to add only one multiplexing client and then repurpose/rename flashbots client and use it for both nova & flashbots, perhaps via an enum. Then, specifically in case of the flashbots client, we would have a helper signature method to sign the necessary fields. Clientwrappers are already wrappers on top of existing client wrappers so I'd like us to be mindful of the level of encapsulation we use. |
|
Closing in favor of #445 |
What
Send FB requests to Nova as well, treating it as fire-and-forget.
Why
To improve SVR.