fix(search): skip soft-delete script propagation to time-series child aliases#28064
fix(search): skip soft-delete script propagation to time-series child aliases#28064harshach wants to merge 32 commits into
Conversation
…ultTimestamp TestSuiteIndex.buildSearchIndexDocInternal computes the top-level lastResultTimestamp field from testSuite.getTestCaseResultSummary(). TestSuiteRepository registers a fetcher for that under the field name "summary"; the reindex path only invokes fetchers whose field is in getRequiredReindexFields(). Without "summary" declared, the fetcher does not run, testCaseResultSummary stays null on the entity, and the Index takes its else branch (TestSuiteIndex.java:41) writing lastResultTimestamp=0L on every reindexed suite. That field is exactly what the DQ /data-quality/test-suites list page sorts by (TestSuites.component.tsx:175), so a full reindex collapses every basic suite to the 1970 epoch and "most recently run first" stops working. SearchIndexFactoryTest pins the declaration. The Playwright spec creates a basic suite, writes a test case result, reindexes the suite with recreate=true, and asserts the resulting ES doc carries the live millisecond timestamp -- not the 0L fallback. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
There was a problem hiding this comment.
Pull request overview
Fixes a search soft-delete propagation bug where the live-indexing SOFT_DELETE_RESTORE_SCRIPT was incorrectly stamping a top-level deleted field onto time-series child documents (e.g., testCaseResolutionStatus, testCaseResult), causing Jackson deserialization failures and UI errors. The PR also addresses selective reindex parity gaps for Data Quality entities so computed/search-dependent fields survive recreate=true reindex flows.
Changes:
- Update
SearchRepository.softDeleteOrRestoredChildrento skip propagation to time-series child aliases and centralize the parent-id field selection via a service-entity lookup. - Expand required selective-reindex fields for
TestCaseIndexandTestSuiteIndexto prevent missing status/summary-derived fields after recreate reindex. - Add coverage across unit, integration, and Playwright tests to prevent regressions in search docs and UI flows.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Filters soft-delete propagation targets to exclude time-series child aliases; consolidates service-entity handling. |
| openmetadata-service/src/main/java/org/openmetadata/service/Entity.java | Adds Entity.isTimeSeriesEntity(...) helper used by search propagation filtering. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/EntityTimeSeriesRepository.java | Uses a helper to leniently deserialize time-series search hits to tolerate legacy polluted docs. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TestCaseIndex.java | Adds missing required reindex fields (incl. result/incident) and replaces magic strings with constants. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/indexes/TestSuiteIndex.java | Declares summary as a required reindex field to preserve summary-derived fields on reindex. |
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TestCaseRepository.java | Promotes loader field keys to public constants and adds TEST_DEFINITION_FIELD. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchRepositoryBehaviorTest.java | Adds unit tests ensuring time-series child aliases are skipped and “all time-series children” becomes a no-op. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/SearchIndexFactoryTest.java | Updates assertions to use repository constants and verifies new reindex-field requirements. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/indexes/TestCaseIndexTest.java | Adds unit test asserting getRequiredReindexFields() includes testCaseResult and incidentId. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TestCaseSoftDeleteSearchIT.java | Adds E2E integration coverage that soft-delete does not pollute TCRS docs with deleted. |
| openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/DataQuality/IncidentManagerAfterSoftDelete.spec.ts | Adds Playwright regression for Incident Manager load after soft-delete (no Jackson error toast). |
| openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/DataQuality/TestCaseStatusAfterReindex.spec.ts | Adds Playwright regression ensuring test case status survives recreate reindex. |
| openmetadata-ui/src/main/resources/ui/playwright/e2e/Features/DataQuality/TestSuiteSummaryAfterReindex.spec.ts | Adds Playwright regression ensuring suite lastResultTimestamp survives recreate reindex. |
The live-indexing SOFT_DELETE_RESTORE_SCRIPT was stamping `deleted` onto every child alias declared in the parent's IndexMapping. For `testCase` two of those children (`testCaseResolutionStatus`, `testCaseResult`) are time-series indexes whose Java classes declare no `deleted` field, so Jackson refused to deserialize the polluted docs and the Incident Manager UI surfaced an "Unrecognized field 'deleted'" toast. SearchRepository.softDeleteOrRestoredChildren now filters child aliases through Entity.isTimeSeriesEntity before invoking the script. The switch-statement on parent entity type also collapses into a single SERVICE_ENTITY_TYPES lookup so the "service.id vs. <type>.id" choice lives in one place. While here: - Promote TestCaseRepository's TEST_SUITE_FIELD / INCIDENTS_FIELD to public constants and add TEST_DEFINITION_FIELD, so TestCaseIndex no longer has magic-string duplicates of the loader keys. - Wrap the three EntityTimeSeriesRepository lenient-parse call sites in a readTimeSeriesSource helper with a comment explaining it tolerates legacy polluted docs until a recreate-reindex scrubs them. Tests: - SearchRepositoryBehaviorTest gains two unit tests asserting that time-series child aliases are skipped and that a parent whose children are all time-series is a no-op (rather than an empty-list call). - TestCaseSoftDeleteSearchIT (new) is the end-to-end IT: create a TC + result + incident, soft-delete the TC, then probe the resolution- status API + ES directly for any top-level `deleted` field. - IncidentManagerAfterSoftDelete.spec.ts (new) is the Playwright regression for the user-facing path that broke. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
40298b8 to
9674ce7
Compare
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
- DistributedReindexFinalizer: EntityPromotionContext now carries
processedRecords. RatioPromotionPolicy.evaluate returns
fullySuccessful=false for incomplete runs (success+failed < total) even
if the ratio over the processed subset clears the threshold. Stops a
job that stopped early from being reported clean.
(Copilot #3231730018)
- SearchClientTagScriptSeparationTest: switched the assertion from
.contains() to a trimmed .endsWith() so a future patch that appends
more tag mutations after the reseparation snippet can't silently
re-introduce drift.
(Copilot #3231730037, #3234280748)
- SearchSeparation/README.md: removed the stale "Database/DatabaseSchema
not yet covered" note now that Phase 6 normalized their patch
signatures and added matrix entries.
(Copilot #3231730059)
- searchSeparation.ts: replaced the no-op test.slow() inside beforeAll
with test.describe.configure({ timeout: 180_000 }) so the timeout
actually applies to the per-test work (PATCH + reindex + four Explore
filter assertions).
(Copilot #3234280670)
- IncidentManagerAfterSoftDelete.spec.ts: bumped the indexing-wait poll
to 120s and added test.setTimeout(180_000). The 30s ceiling kept
hitting in postgres CI runners where the test-result + resolution-
status indexing pipeline takes longer than locally.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…onPage + IncidentManager
Three real failures from the latest CI run (8x SearchSeparation 401s,
3x ServiceEntityVersionPage TypeErrors, 1x IncidentManagerAfterSoftDelete
400):
- searchSeparation.ts: reverted getApiContext(page) → createNewPage(browser)
for the reindex POST. The /api/v1/search/reindexEntities endpoint
rejects the page-extracted token with 401; createNewPage opens a fresh
admin-scoped context that the endpoint accepts. This was the source of
8 failures across the matrix (Dashboard / Topic / Pipeline / MlModel /
Container / ApiEndpoint / StoredProcedure / Metric / Database /
DatabaseSchema specs all run against the same helper). Trade-off
acknowledged in the comment that updated PR review #3229330057 — the
optimization is not worth a broken endpoint.
- GlossaryRenameCascade.spec.ts: added test.setTimeout(180_000). The
create + tag + rename + dual ES poll comfortably exceeds the 60s
default on CI.
- IncidentManagerAfterSoftDelete.spec.ts: the incident-status search
endpoint expects offset + latest query params (matches
getListTestCaseIncidentStatusFromSearch in incidentManagerAPI.ts).
Without them the server returns 400. Added &offset=0&latest=true.
- ServiceEntityVersionPage.spec.ts: the beforeAll loop called
entity.patch(apiContext, payload) positionally on every entity in the
fixture map. After the previous Phase 6 commit normalized
DatabaseClass + DatabaseSchemaClass to the object signature
({apiContext, patchData}), those two iterations crashed with "Cannot
read properties of undefined (reading 'patch')". Dispatch by
instanceof: Database/DatabaseSchema get the object form, every other
entity (still positional) gets the old call.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
…ctory The 12 SearchSeparation reindex tests and GlossaryRenameCascade's pre-rename assertion all timed out polling /api/v1/search/query because they queried `fullyQualifiedName.keyword:"..."`. That field is mapped as `type: keyword` directly (with lowercase_normalizer), with no `.keyword` subfield -- the query returned zero hits across every entity index, so the poll never saw hits[0]._source. Sibling reindex specs already pass with the bare `fullyQualifiedName:` form. Also relocate searchSeparation.ts out of playwright/utils/ -- it called test.describe.serial / test() which doesn't belong under utils -- into playwright/e2e/Features/SearchSeparation/searchSeparationSuite.ts next to its 12 consumer specs. Imports rewritten. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous commit moved the suite factory from playwright/utils/ to playwright/e2e/Features/SearchSeparation/searchSeparationSuite.ts, but the README example still showed the old utils path -- contributors following the doc would get a broken import. Addresses Copilot PR comment #3240052790. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…'tags')
TAG_RESEPARATION_SCRIPT is appended to UPDATE_FQN_PREFIX_SCRIPT, which
SearchRepository invokes against GLOBAL_SEARCH_ALIAS on classification /
tag rename. That alias includes tag_search_index, whose docs have no
`tags` field. The four post-loop writes
(ctx._source.tags / tier / classificationTags / glossaryTags) ran
unconditionally and polluted tag docs with empty arrays and a null tier
they were never meant to carry.
Move the four writes inside the existing
`if (containsKey('tags') && tags != null)` guard so docs that don't have
tags are left untouched.
Pin the behavior with a new SearchClientTagScriptSeparationTest case
that asserts none of the four reseparation writes appear before the
containsKey('tags') guard in the snippet text -- catches a regression at
unit-test time without needing painless execution.
Addresses Copilot PR comment #3240052837.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…agation' into fix-soft-delete-time-series-propagation
- searchSeparation.ts: replaced createNewPage(browser) with
createAdminApiContext() in beforeAll/afterAll/recreate-reindex test.
No more opening a full UI page just to grab a token; the helper does
/api/v1/auth/login directly. Cuts per-suite navigation cost across
all 11 SearchSeparation entity specs.
(Copilot #3236380307)
- ServiceEntityVersionPage.spec.ts: removed the `as unknown as
Operation[]` cast. `patchData` is now typed `Operation[]` directly,
and the instanceof dispatch is extracted into a small typed
`applyServicePatch` helper so call sites read cleanly. Documented
that the helper exists only until every entity class is normalized
to the object-form patch signature.
(Copilot #3236380348)
Note: The "ColumnGridResourceIT.java:324 search_phase_execution_exception
all shards failed" surfaced in the comment is from commit 23f82b7
(PR #24012, Bulk import for columns metadata) — predates this branch
and is unrelated to the soft-delete/separation work.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copilot review feedback (discussion_r3240052749, r3240719140): calling
test.slow() inside test.beforeAll is not a supported way to extend either
the test or hook timeout in Playwright. Replace with:
- test.describe.configure({ timeout: 180_000 }) for the two tests
- test.setTimeout(180_000) inside beforeAll/afterAll for the hooks
so timeouts apply at the scopes where they actually matter.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
GlossaryRenameCascade.spec.ts caught a regression: after a glossary-term rename fires UPDATE_GLOSSARY_TERM_TAG_FQN_BY_PREFIX_SCRIPT, the appended TAG_RESEPARATION_SCRIPT was unconditionally executing `ctx._source.tier = tier`. Since TaggableIndex.applyTagFields strips Tier out of tags[] into the dedicated `tier` field at live-index time, the loop never sees a Tier.* entry in tags[], so `tier` stays null and the assignment wipes the live-indexed dedicated field. Wrap the assignment in `if (tier != null)` so the snippet: - still lifts a legacy / polluted Tier.* out of tags[] when one is present - never zeros out a tier field that lives on the dedicated path classificationTags / glossaryTags assignments stay unconditional — they are derived from the post-mutation tags[], which is the source of truth for those denormalised lists. Locked in by a new SearchClientTagScriptSeparationTest case that asserts the painless contains `if (tier != null)` immediately before the `ctx._source.tier =` write. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
| const { apiContext, afterAction } = await createNewPage(browser); | ||
|
|
||
| const table = new TableClass(); | ||
|
|
||
| try { | ||
| await table.create(apiContext); | ||
| const testCase = await table.createTestCase(apiContext); | ||
| const testCaseFqn = testCase.fullyQualifiedName as string; | ||
| const testCaseId = testCase.id as string; | ||
|
|
||
| await table.addTestCaseResult(apiContext, testCaseFqn, { | ||
| result: 'soft-delete propagation regression', | ||
| testCaseStatus: 'Failed', | ||
| timestamp: Date.now(), | ||
| }); | ||
|
|
||
| // Wait until the incident is actually indexed before we soft-delete — | ||
| // otherwise the script propagation race is meaningless. CI runners can be slow on the | ||
| // first-time test-result + resolution-status indexing pipeline, so allow up to 2 min. | ||
| test.setTimeout(180_000); | ||
| // Match the production UI's call shape — the search endpoint expects offset + latest | ||
| // (matches `getListTestCaseIncidentStatusFromSearch` in |
`ColumnGridResourceIT#test_getColumnGrid_withMetadataStatusIncomplete` was hitting `[es/search] failed: [search_phase_execution_exception] all shards failed` on every PR run across MySQL+ES and Postgres+ES profiles. Root cause: the metadata-status filter built `hasNonEmptyField(field)` as `bool.must(exists).mustNot(term(field, ""))`. For analyzed text fields like `columns.description` (`type: text`, `analyzer: om_analyzer`), the empty string produces zero tokens; ES 7.17 / OS reject the term query with "all shards failed" instead of returning an empty set. Replace the construction with `wildcard(field, "?*")`, which matches any doc whose indexed terms contain at least one character — the analyzer- friendly equivalent of "field has non-empty value". `hasEmptyOrMissingField` becomes `mustNot(hasNonEmptyField)`, which is functionally identical (a missing field has no tokens and trivially fails the wildcard). No mapping changes required — the wildcard executes against existing indexed terms in `columns.description` and `dataModel.columns.description`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Code Review 👍 Approved with suggestions 3 resolved / 4 findingsPrevents soft-delete script propagation to time-series child aliases and resolves reindex data loss issues by explicitly defining required fields. Address the empty catch block in 💡 Quality: Empty catch block swallows exception silentlyThe Log the serialization exception message in the fallback branch instead of silently ignoring it.✅ 3 resolved✅ Quality: Duplicate SERVICE_ENTITY_SET and SERVICE_ENTITY_TYPES sets
✅ Quality: Magic string "summary" in TestSuiteIndex
✅ Bug: SERVICE_ENTITY_TYPES missing API_SERVICE — wrong parentIdField
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
|



Summary
Live indexing's
SOFT_DELETE_RESTORE_SCRIPTwas stampingdeletedonto every child alias declared on a parent'sIndexMapping. FortestCasetwo of those children (testCaseResolutionStatus,testCaseResult) are time-series indexes whose Java classes declare nodeletedfield, so Jackson refused to deserialize the polluted docs and the Incident Manager UI surfaced an "Unrecognized field 'deleted'" error toast.This PR bundles the structural fix for the bug class along with the related reindex-parity fixes already on the branch:
SearchRepository.softDeleteOrRestoredChildrennow filters child aliases through a newEntity.isTimeSeriesEntity(...)helper before invoking the script. The parent-type switch collapses into a singleSERVICE_ENTITY_TYPESlookup so theservice.idvs<type>.idchoice lives in one place.TestCaseIndexdeclarestestCaseResultandincidentId(andtestSuite/testSuites/testDefinition) ingetRequiredReindexFields(), fixing the related 1.12.7 reindex bug wheretestCaseStatusblanked out after a reindex.TestSuiteIndexdeclaressummaryas a required reindex field solastResultTimestamp(computed from the summary) survives reindex.TestCaseRepositorypromotesTEST_SUITE_FIELD/INCIDENTS_FIELDtopublicand addsTEST_DEFINITION_FIELD, soTestCaseIndexreferences the loader keys directly rather than magic strings.EntityTimeSeriesRepositorywraps the three lenient-parse call sites in areadTimeSeriesSourcehelper with a comment explaining it tolerates legacy polluted docs until a recreate-reindex cleans them.Tests added
SearchRepositoryBehaviorTest(+2 tests)TestCaseIndexTestgetRequiredReindexFieldsincludestestCaseResultandincidentIdTestCaseSoftDeleteSearchITdeletedfield on TCRS docsIncidentManagerAfterSoftDelete.spec.tsTestCaseStatusAfterReindex.spec.ts(prior commit)TestSuiteSummaryAfterReindex.spec.ts(prior commit)lastResultTimestampsurvives reindexTest plan
SearchRepositoryBehaviorTest,TestCaseIndexTest,SearchIndexFactoryTestgreen (all ran locally)TestCaseSoftDeleteSearchITagainst MySQL + PostgreSQLIncidentManagerAfterSoftDelete,TestCaseStatusAfterReindex,TestSuiteSummaryAfterReindex)curl 'localhost:9200/test_case_resolution_status_search_index/_search?q=_exists_:deleted'returns zero hits for the affected TC.🤖 Generated with Claude Code
Summary by Gitar
postLogicalSuiteRelationshipUpdateto invalidate read-bundles and trigger lifecycle events selectively.logShardFailureDetailstoElasticSearchColumnAggregatorto capture root causes, query context, and indexes during ES failures.This will update automatically on new commits.