Skip to content

ElasticSearch: Add timeout parameter to _search#28935

Open
lovasoa wants to merge 1 commit into
trinodb:masterfrom
lovasoa:add-elasicsearch-timeout-parameter
Open

ElasticSearch: Add timeout parameter to _search#28935
lovasoa wants to merge 1 commit into
trinodb:masterfrom
lovasoa:add-elasicsearch-timeout-parameter

Conversation

@lovasoa
Copy link
Copy Markdown

@lovasoa lovasoa commented Mar 30, 2026

Description

Elasticsearch _search requests now carry the configured request timeout, and Trino fails the query when Elasticsearch reports that the search timed out.

This makes Elasticsearch aware of the timeout, so that it cancels the query on its side even if the Trino node is too slow and does not cut the HTTP connection on time.

Additional context and related issues

Related to #28927.

Release notes

Release notes are required, with the following suggested text:

## fixes
* elasticsearch connector: propagate query timeout information to elasticsearch ({issue}`28927`)

@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Mar 30, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@github-actions github-actions Bot added the elasticsearch Elasticsearch connector label Mar 30, 2026
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 1da2ecc3b0

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines 618 to +620
try {
return client.search(request);
SearchResponse response = client.search(request);
if (response.isTimedOut()) {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Clear scroll when aborting timed-out initial search

If client.search(request) returns timed_out=true, this branch throws before returning the SearchResponse. In the beginSearch path, that happens before ScanQueryPageSource.SearchHitIterator is constructed, so no caller can read the returned _scroll_id and invoke clearScroll. When timed-out scroll searches return a scroll ID, those contexts are left open until elasticsearch.scroll-timeout expires, which can accumulate under repeated timeouts and waste Elasticsearch resources.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@codex This is a good remark, but I think it's a separate issue. Whether we want to clear scrolls and when is not specific to the timeout case.

If your human masters want me to specifically add a clearScroll call before the throw here I'll do it though.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lovasoa
Copy link
Copy Markdown
Author

lovasoa commented Mar 30, 2026

I signed the cla and sent it to cla@trino.io.

SearchSourceBuilder sourceBuilder = SearchSourceBuilder.searchSource()
.query(query);

sourceBuilder.timeout(new TimeValue(requestTimeout.toMillis(), MILLISECONDS));
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sourceBuilder is a builder. There is no need to separate the call.

Also, the MILLISECONDS is redundant. We can call the constructor without the parameter, or timeValueMillis helper method instead.

        SearchSourceBuilder sourceBuilder = SearchSourceBuilder.searchSource()
                .query(query)
                .timeout(timeValueMillis(requestTimeout.toMillis()));

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additionally do we need to configure them for count queries as well ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was looking at ES documentation, setting this timeout doesn't guarantee a query to be terminated right ?

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved the timeout to the builder chain, and used new TimeValue(requestTimeout.toMillis()) to match how the scroll timeout is defined

@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Mar 31, 2026

Add ES timeout to _search

Could you change the commit to "Add timeout to _search in Elasticsearch"?

@Praveen2112
Copy link
Copy Markdown
Member

@lovasoa But even setting request timeout wouldn't help us much right ? It might return partial result and the queries might still be executing.
In case of elastic/elasticsearch#47716

We have a new way to cancel search requests efficiently from the client in 7.4 (by closing the underlying http channel) and we expose this in our official Java client so we don't need this option on the server if the goal is to trash any requests that take more than a configured threshold

Wasn't this targeting e-commerce scenarios? Shoppers have a limited attention span and there's a desire to serve something within a given time window.

I think even setting this property might not help us.

@lovasoa
Copy link
Copy Markdown
Author

lovasoa commented Mar 31, 2026

This pr is related to, but does not fix #28927.

This pr prevents queries from executing after the http timeout configured in the es plugin settings when trino does not close the connection quick enough (because the node is overloaded for instance).

We still need a separate fix to handle trino-side cancellations (the issue in #28927)

lovasoa added a commit to lovasoa/trino that referenced this pull request Mar 31, 2026
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Mar 31, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@lovasoa lovasoa force-pushed the add-elasicsearch-timeout-parameter branch from e0bcc44 to 05c89eb Compare March 31, 2026 11:10
@cla-bot
Copy link
Copy Markdown

cla-bot Bot commented Mar 31, 2026

Thank you for your pull request and welcome to the Trino community. We require contributors to sign our Contributor License Agreement, and we don't seem to have you on file. Continue to work with us on the review and improvements in this PR, and submit the signed CLA to cla@trino.io. Photos, scans, or digitally-signed PDF files are all suitable. Processing may take a few days. The CLA needs to be on file before we merge your changes. For more information, see https://github.com/trinodb/cla

@lovasoa
Copy link
Copy Markdown
Author

lovasoa commented Mar 31, 2026

@cla-bot : no need to repost your message on every commit. I told you I sent the signed cla already.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 05c89ebc31

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +653 to +655
if (response.isTimedOut()) {
throw new TrinoException(ELASTICSEARCH_CONNECTION_ERROR, "Elasticsearch query timed out");
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Handle timed-out scroll pages without leaking contexts

Throwing immediately on response.isTimedOut() drops the new scroll ID returned by that page, because SearchHitIterator.reset(...) is only called after nextPage returns (ScanQueryPageSource.computeNext). In clusters where Elasticsearch rotates scroll IDs between pages, close() will clear only the stale prior ID, leaving the timed-out page's scroll context open until elasticsearch.scroll-timeout expires; under repeated timeouts this accumulates server-side resources.

Useful? React with 👍 / 👎.

@Praveen2112
Copy link
Copy Markdown
Member

This pr prevents queries from executing after the http timeout configured in the es plugin settings when trino does not close the connection quick enough (because the node is overloaded for instance).

But setting this property doesn't ensure the query is cancelled right ? I'm trying to understand what we would benefit here ? If our primary goal is support partial result then yes, if not doesn't closing the channel helps us here.

@lovasoa
Copy link
Copy Markdown
Author

lovasoa commented Mar 31, 2026

Closing the channel does the job. But the person responsible for the elasticsearch cluster wants a guarantee that the query will not keep using resources on the elasticsearch side whatever happens on the client side. If the trino node is overloaded, it may take several seconds to actually close the connection (and thus for the cancellation to propagate to elasticsearch). This pr does not remove the HTTP timeout, but it propagates the timeout information to elasticsearch, so that the query is always cancelled after the timeout on the ES side, whatever happens on the trino side.

@lovasoa
Copy link
Copy Markdown
Author

lovasoa commented Mar 31, 2026

I opened a separate PR to fix #28927 : #28941

@github-actions
Copy link
Copy Markdown

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions Bot added the stale label Apr 22, 2026
@martint
Copy link
Copy Markdown
Member

martint commented May 8, 2026

@cla-bot check

@github-actions github-actions Bot added cla-signed and removed stale labels May 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed elasticsearch Elasticsearch connector

Development

Successfully merging this pull request may close these issues.

4 participants