Skip to content

feat(rdf): SPARQL playground, MCP knowledge-graph tools, insights endpoints#28042

Open
harshach wants to merge 1 commit into
mainfrom
harshach/rdf-fidelity
Open

feat(rdf): SPARQL playground, MCP knowledge-graph tools, insights endpoints#28042
harshach wants to merge 1 commit into
mainfrom
harshach/rdf-fidelity

Conversation

@harshach
Copy link
Copy Markdown
Collaborator

@harshach harshach commented May 11, 2026

Describe your changes

This PR closes the RDF fidelity gaps in the knowledge-graph stack and removes the experimental R2RML row-materialization feature (the schema-level concept-graph is the strategy going forward; row triples don't scale and were the wrong target). It adds: a SPARQL playground UI, five MCP knowledge-graph tools (SparqlQueryTool, EntityNeighborhoodTool, FindByTagTool, OntologyDescribeTool, ShaclValidateTool), insights endpoints (PageRank-based importance, Louvain communities, shortest-path explain-lineage, tag/glossary co-occurrence, recommendations), an inference-rule registry with a starter pack (transitive lineage, schema-tag/domain inheritance, PII-propagation-via-lineage), federated-SPARQL allowlist, expanded SHACL shapes + REST validator, custom-ontology upload/extension, PROV-O activity mapper, DQV quality mapper, usage mapper, and JSON-LD context coverage for AI/Automation/Governance entities. Deletes the R2RML mapping schema, validator, registry, applier, materializer workflow, OntologyEmitter sink, Linked-Data UI mode, and emitOntologyTriples profiler flag. Adds the OpenMetadata ontology TTL coverage (skos, dcat, prov, dprod, foaf) for the new entity types and a CHANGELOG.

Type of change

  • Improvement
  • New feature
  • Documentation

Tests

  • Unit + ITs added for every new component: SparqlQueryToolTest, EntityNeighborhoodToolTest, FindByTagToolTest, OntologyDescribeToolTest, ShaclValidateToolTest, CustomOntologyValidatorTest, SparqlFederationGuardTest, InferenceRuleValidatorTest, CentralityComputationTest, CommunityComputationTest, LouvainTest, PageRankTest, LineagePathBuilderTest, LineagePathFinderTest, ImportanceQueryBuilderTest, CoOccurrenceQueryBuilderTest, RecommendationsQueryBuilderTest, RdfShaclValidatorTest, OntologyDocumentTest, RdfUsageMapperTest, RdfJsonLdContextTest, expanded RdfPropertyMapperTest, and RdfResourceIT. SPARQL playground covered by SparqlPlayground.test.tsx.

UI screen recording / screenshots

SPARQL playground + KnowledgeGraph view: screenshot/recording to be attached on PR.

🤖 Generated with Claude Code


Summary by Gitar

  • Indexing Infrastructure:
    • Replaced Quartz-dependent indexing with a decoupled SearchIndexExecutor and IndexingPipeline for improved scalability.
    • Implemented RedisJobNotifier for cluster-wide reindex orchestration and added adaptive backoff logic.
  • RDF Fidelity and Insights:
    • Added RdfIriValidator, SparqlFederationGuard, and inference rule registry (InferenceRuleValidator) for compliant RDF querying.
    • Implemented analytical insights via CentralityComputation, CommunityComputation, and lineage path finding.
  • MCP & Integration:
    • Added five MCP knowledge-graph tools including SparqlQueryTool and EntityNeighborhoodTool.
  • Entity & Metadata:
    • Added CustomOntologyRegistry and OntologyDocument for custom-ontology extension and JSON-LD context coverage.
  • Database Tuning:
    • Removed legacy DbTune module and related diagnostic utilities.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings May 11, 2026 18:29
@harshach harshach requested review from a team, akash-jain-10 and tutte as code owners May 11, 2026 18:29
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels May 11, 2026
@github-actions
Copy link
Copy Markdown
Contributor

✅ TypeScript Types Auto-Updated

The generated TypeScript types have been automatically updated based on JSON schema changes in this PR.

@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR expands OpenMetadata’s RDF/knowledge-graph capabilities end-to-end: UI support for running SPARQL, spec-level ontology/context updates, and service/MCP tooling for querying, validation (SHACL), federation-guarding, inference rules, and “insights” SPARQL query builders.

Changes:

  • Adds SPARQL playground client support + routing, plus supporting UI enums/interfaces and locale keys.
  • Introduces/extends RDF specs (ontology changelog, JSON-LD contexts, RDF configuration schema) and adds inference-rule + federation configuration shapes.
  • Adds service-side RDF utilities (ontology document serving, SHACL validator, usage/quality/activity mappers, insights query builders) and MCP tools that expose read-only SPARQL + graph utilities.

Reviewed changes

Copilot reviewed 97 out of 100 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
openmetadata-ui/src/main/resources/ui/src/rest/rdfAPI.ts Adds SPARQL playground REST helper (runSparqlQuery) and result typing.
openmetadata-ui/src/main/resources/ui/src/pages/SparqlPlayground/SparqlPlayground.interface.ts Adds local types/constants for saved queries and sample queries.
openmetadata-ui/src/main/resources/ui/src/locale/languages/en-us.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-tw.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/zh-cn.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/tr-tr.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/th-th.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ru-ru.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-pt.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pt-br.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/pr-pr.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/nl-nl.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/mr-in.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ko-kr.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ja-jp.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/he-he.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/gl-es.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/fr-fr.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/es-es.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/de-de.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/locale/languages/ar-sa.json Adds SPARQL playground-related labels/messages; removes one label key.
openmetadata-ui/src/main/resources/ui/src/enums/codemirror.enum.ts Adds CodeMirror mode value for SPARQL.
openmetadata-ui/src/main/resources/ui/src/constants/constants.ts Adds route constant for SPARQL playground.
openmetadata-ui/src/main/resources/ui/src/components/KnowledgeGraph/KnowledgeGraph.tsx Refactors metadata-mode body rendering; removes a header label.
openmetadata-ui/src/main/resources/ui/src/components/AppRouter/AuthenticatedAppRouter.tsx Registers lazy-loaded SPARQL playground route.
openmetadata-spec/src/main/resources/rdf/ontology/openmetadata-prov.ttl Adds missing RDF prefix in PROV extension TTL.
openmetadata-spec/src/main/resources/rdf/ontology/CHANGELOG.md Introduces ontology changelog documenting fidelity changes.
openmetadata-spec/src/main/resources/rdf/contexts/lineage.jsonld Adjusts lineage JSON-LD mappings to avoid predicate collisions.
openmetadata-spec/src/main/resources/rdf/contexts/governance.jsonld Aligns governance mappings with SKOS predicates and adds new fields.
openmetadata-spec/src/main/resources/rdf/contexts/automation.jsonld Adds JSON-LD context for automation/workflow entities.
openmetadata-spec/src/main/resources/rdf/contexts/ai.jsonld Adds JSON-LD context for AI/MCP-related entities.
openmetadata-spec/src/main/resources/json/schema/api/configuration/rdfConfiguration.json Adds federation config block to RDF configuration schema.
openmetadata-spec/src/main/resources/json/schema/api/configuration/rdf/inferenceRule.json Adds schema for inference rule objects (CONSTRUCT/RDFS placeholder).
openmetadata-spec/src/main/resources/json/schema/api/configuration/rdf/customOntology.json Adds schema for custom ontology extension definitions.
openmetadata-service/src/test/java/org/openmetadata/service/resources/rdf/RdfShaclValidatorTest.java Adds SHACL validation tests for key shape constraints.
openmetadata-service/src/test/java/org/openmetadata/service/resources/rdf/OntologyDocumentTest.java Adds tests for ontology serving endpoint serialization.
openmetadata-service/src/test/java/org/openmetadata/service/rdf/translator/RdfUsageMapperTest.java Adds tests for RDF usage summary mapping.
openmetadata-service/src/test/java/org/openmetadata/service/rdf/insights/RecommendationsQueryBuilderTest.java Adds tests for recommendations SPARQL builder validation/shape.
openmetadata-service/src/test/java/org/openmetadata/service/rdf/insights/CoOccurrenceQueryBuilderTest.java Adds tests for co-occurrence/popularity/reach SPARQL builders.
openmetadata-service/src/main/resources/rdf/inference-rules/transitive-lineage-closure.json Adds starter inference rule JSON.
openmetadata-service/src/main/resources/rdf/inference-rules/schema-tag-inheritance.json Adds starter inference rule JSON.
openmetadata-service/src/main/resources/rdf/inference-rules/pii-propagation-via-lineage.json Adds starter inference rule JSON.
openmetadata-service/src/main/resources/rdf/inference-rules/domain-membership-inheritance.json Adds starter inference rule JSON.
openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfShaclValidator.java Adds SHACL shapes loader + validator helper.
openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/OntologyDocument.java Adds merged ontology document loader + serializer for multiple formats.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfUsageMapper.java Adds mapper emitting usage-summary triples.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfQualityMapper.java Adds mapper emitting DQV quality measurement resources.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/RdfActivityMapper.java Adds PROV activity mapping for pipeline runs.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/translator/JsonLdTranslator.java Loads new JSON-LD contexts and assigns column IDs for named column resources.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfUtils.java Adds RDF types for new entities and a helper to mint column URIs.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/RecommendationsQueryBuilder.java Adds recommendations SPARQL builder.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/ImportanceQueryBuilder.java Adds importance ranking SPARQL builder.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CoOccurrenceQueryBuilder.java Adds co-occurrence/popularity/reach SPARQL builders.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/inference/InferenceRuleValidator.java Adds strict validation for inference rule bodies.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/inference/InferenceRuleRegistry.java Adds in-memory inference rule registry + starter pack loader.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/federation/SparqlFederationGuard.java Adds SERVICE allowlist enforcement for federated SPARQL.
openmetadata-service/src/main/java/org/openmetadata/service/rdf/extension/CustomOntologyRegistry.java Adds in-memory registry for custom ontology extensions.
openmetadata-mcp/src/test/java/org/openmetadata/mcp/tools/OntologyDescribeToolTest.java Adds tests for ontology describe MCP tool.
openmetadata-mcp/src/test/java/org/openmetadata/mcp/tools/FindByTagToolTest.java Adds tests for find-by-tag MCP tool.
openmetadata-mcp/src/test/java/org/openmetadata/mcp/tools/EntityNeighborhoodToolTest.java Adds tests for entity neighborhood MCP tool.
openmetadata-mcp/src/main/resources/json/data/mcp/tools.json Registers new MCP tool schemas for KG/SPARQL features.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SparqlQueryTool.java Adds read-only SPARQL MCP tool with federation guard + truncation.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/ShaclValidateTool.java Adds SHACL validation MCP tool (entity-scoped or full graph).
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/OntologyDescribeTool.java Adds ontology describe MCP tool (full ontology or DESCRIBE).
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/FindByTagTool.java Adds MCP tool to find entities by tag/glossary FQN.
openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/DefaultToolContext.java Wires new MCP tools into tool dispatcher.
ingestion/src/metadata/workflow/profiler.py Adds optional ontology emission step to profiler workflow.
docker/development/docker-compose-postgres-fuseki.yml Adds RDF endpoint env var for Fuseki-based dev stack.
Comments suppressed due to low confidence (1)

openmetadata-service/src/main/java/org/openmetadata/service/rdf/RdfUtils.java:20

  • New RDF types were added for workflowinstance, agentexecution, mcpexecution, automation, etc., but PROV_ACTIVITY_TYPES was not updated. Since JsonLdTranslator.toRdf() relies on getProvType() to add prov:Activity typing (when the primary rdf:type is not already a PROV class), these new execution-like entities will miss rdf:type prov:Activity, reducing PROV-O compatibility and potentially breaking queries that filter by prov:Activity. Please extend the PROV type sets to cover the new entity types.
  private static final Set<String> PROV_ACTIVITY_TYPES =
      Set.of(
          "pipeline",
          "ingestionpipeline",
          "storedprocedure",
          "dbtpipeline",
          "workflow",
          "pipelinerun");

Comment thread ingestion/src/metadata/workflow/profiler.py Outdated
@github-actions
Copy link
Copy Markdown
Contributor

The Python checkstyle failed.

Please run make py_format and py_format_check in the root of your repository and commit the changes to this PR.
You can also use pre-commit to automate the Python code formatting.

You can install the pre-commit hooks with make install_test precommit_install.

harshach added a commit that referenced this pull request May 11, 2026
…ntic correctness

Addresses the open review comments on PR #28042. Each fix is independent.

P0 — correctness / security
- ingestion/profiler.py: drop residual OntologyEmitter import (module removed
  in the R2RML pivot). Every pytest that imports profiler.py was failing with
  ModuleNotFoundError; reverts file to match main.
- RdfResource.validateGraph + ShaclValidateTool: replace `entityUri.replace(">", "")`
  with strict absolute-http(s)-IRI validation. Reject control chars, whitespace,
  quotes, and angle brackets up front. Closes the SPARQL-injection vector via
  newlines / # comments in the DESCRIBE template.
- EntityNeighborhoodTool.buildConstructQuery: rewrite to emit each path edge in
  its own UNION arm with the correct subject. The previous BIND(<entity> AS ?s)
  applied across all arms collapsed every multi-hop edge onto the start node.

P1 — semantic / robustness
- CommunityComputation.parseGraph: stop emitting both directions; the FILTER in
  the SPARQL canonicalises pairs and Louvain.addAllEdges symmetrises the
  adjacency internally. Double symmetrisation was doubling every edge weight.
- RdfActivityMapper: emit `prov:wasInformedBy` (Activity → Activity) instead of
  `prov:wasGeneratedBy` (Entity → Activity) for pipeline-run → pipeline. Closes
  the PROV-O domain/range inversion.
- RdfQualityMapper.measurementUri: deterministic URI built from subject + metric
  + timestamp. Random UUID was leaking orphan QualityMeasurement nodes on every
  re-emit because deletion only follows subject/object on the entity itself.
- SparqlQueryTool: byte-aware truncation. Previous substring-by-char cap could
  exceed maxBytes for multi-byte UTF-8 and never enforced a real byte limit.
- ShaclValidateTool: require explicit `fullGraph=true` to validate the entire
  triplestore; reject otherwise. Prevents accidental OOM on multi-GB graphs.
- RdfResource.getFederationGuard: synchronise the lazy-init path; the volatile
  field was double-checked without a lock.

P2 — code quality
- RdfResource.validateGraph: replace 14 inline FQN class names with proper
  imports (Model / ModelFactory / Lang / RDFDataMgr / RDFFormat / ValidationReport).
- RdfShaclValidator + OntologyDocument: catch RuntimeException (covers
  RiotException) in the static-init shapes/ontology load. A corrupt classpath
  resource now degrades to an empty model instead of failing class init and
  taking down /rdf/* and MCP describe.
- i18n: restore `label.view-mode` (still referenced by
  OntologyExplorer/FilterToolbar.tsx) and route the SparqlPlayground sample-query
  names through translation keys (`label.sparql-sample-*`). All 17 locales
  synced via `yarn i18n`.

Tests updated where the fix changes observable behaviour:
- EntityNeighborhoodToolTest: assert the new multi-hop subject preservation
  and the new depth-3 chain variable name.
- CommunityComputationTest: assert the directed-single-edge behaviour
  (Louvain symmetrises internally).
- RdfPropertyMapperTest: assert `prov:wasInformedBy` instead of
  `prov:wasGeneratedBy` for the pipeline activity.
- SparqlPlayground.test.tsx: drive the sample button by `nameKey`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings May 11, 2026 19:48
Comment thread openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/ShaclValidateTool.java Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 95 out of 98 changed files in this pull request and generated 4 comments.

Comment thread openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/FindByTagTool.java Outdated
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of lines (20,000). Try reducing the number of changed lines and requesting a review from Copilot again.

…points

Closes the RDF fidelity gaps in the knowledge-graph stack and removes the
experimental R2RML row-materialization feature (schema-level concept-graph
is the strategy going forward; row triples don't scale).

Adds
- SPARQL playground page (read-only SELECT/ASK/CONSTRUCT/DESCRIBE), multiple
  result formats, inject-prefixes helper, save/reload queries, sample queries
  driven by i18n keys.
- Five MCP knowledge-graph tools: SparqlQueryTool (UTF-8 safe byte
  truncation), EntityNeighborhoodTool (per-arm correct subjects for multi-hop
  paths), FindByTagTool (matches Tag.tagFQN + GlossaryTerm.fullyQualifiedName),
  OntologyDescribeTool (full format→MIME mapping + IRI-validated resource),
  ShaclValidateTool (entity-scoped or explicit fullGraph=true).
- Insights endpoints under /v1/rdf/insights/: importance, communities,
  shortest-path, recommendations, centrality, co-occurrence.
- Inference-rule registry + starter pack, federated-SPARQL allowlist guard,
  expanded SHACL shapes + REST validation, custom-ontology upload/extension.
- PROV-O activity mapper (prov:wasInformedBy for run→pipeline; agent IRIs
  under the entity namespace, never the ontology prefix), DQV quality
  measurements with deterministic per-(subject, metric, timestamp) URIs,
  usage mapper, JSON-LD contexts for AI/Automation/Governance, full ontology
  TTL.

Deletes
- R2RML mapping schema/validator/registry/applier and REST endpoints.
- KnowledgeGraph LinkedData UI and rdfAPI.ts R2RML helpers.
- emitOntologyTriples flag on the profiler pipeline schema.

Security
- New shared RdfIriValidator: every SPARQL DESCRIBE path validates
  user-supplied IRIs as absolute http(s) and rejects control chars, newlines,
  # comments, quotes, and angle brackets before template interpolation.
- SparqlFederationGuard lazy-init synchronised.
- RdfShaclValidator + OntologyDocument catch RuntimeException on TTL load.

Tests
- New unit tests for every component; expanded RdfPropertyMapperTest and
  CommunityComputationTest. SparqlPlayground.test.tsx covers the UI.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@harshach harshach force-pushed the harshach/rdf-fidelity branch from 843894e to e932e1c Compare May 12, 2026 19:40
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented May 12, 2026

Code Review ⚠️ Changes requested 6 resolved / 7 findings

Introduces a robust SPARQL playground, MCP knowledge-graph tools, and advanced insights endpoints while removing deprecated R2RML features. Please address the identified SPARQL UPDATE endpoint bypass vulnerability, which currently allows unauthorized query execution through malformed prefixes.

⚠️ Security: SPARQL UPDATE endpoint bypassable via PREFIX prefix

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:1088-1098

The SPARQL UPDATE endpoint at /sparql/update validates the query type by checking if the uppercased trimmed body startsWith("INSERT"), etc. This is trivially bypassed with a query that starts with a PREFIX declaration or a comment:

PREFIX om: <...>
SELECT * WHERE { ?s ?p ?o }

A read query prefixed with PREFIX or BASE would pass neither the INSERT/DELETE/etc check nor be rejected — but more importantly, legitimate UPDATE queries starting with PREFIX (which is standard SPARQL practice) will be incorrectly rejected with a 400 error.

The correct approach is to parse the query with Jena's UpdateFactory and reject if parsing fails, rather than string-prefix checking.

Fix
// Replace prefix-based validation with proper parsing:
try {
  org.apache.jena.update.UpdateFactory.create(sparqlQuery.getQuery());
} catch (Exception e) {
  return Response.status(Response.Status.BAD_REQUEST)
      .entity(buildErrorResponse("Invalid SPARQL UPDATE: " + e.getMessage()))
      .build();
}
getRdfRepository().executeSparqlUpdate(sparqlQuery.getQuery());
✅ 6 resolved
Security: SPARQL injection via insufficient URI sanitization in validateGraph

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:197-200 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:202-205 📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:199 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:203
The validateGraph endpoint (line 199) sanitizes user-supplied entityUri by only stripping > characters before interpolating into a SPARQL query: String.format("DESCRIBE <%s>", entityUri.replace(">", "")). This is insufficient — an attacker can inject SPARQL via newlines, # comments, or other control characters. For example, entityUri = "http://x> . } DELETE WHERE { ?s ?p ?o } #" would close the angle bracket context on a newline. Since this is an admin-only endpoint the blast radius is somewhat limited, but it's still a direct injection vector against the triplestore.

The same pattern appears in CommunityComputation.persistCommunities (line 203) where member URIs from SPARQL results are directly interpolated into SPARQL UPDATE statements — if a malicious triple exists in the graph, it could inject arbitrary updates.

Bug: CONSTRUCT query flattens multi-hop neighborhood onto single subject

📄 openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/EntityNeighborhoodTool.java:102-116
In EntityNeighborhoodTool.buildConstructQuery (line 102-121), the WHERE clause captures ?p ?o triples from 1-3 hops away, but then does BIND(<entityUri> AS ?s) and the CONSTRUCT template is { ?s ?p ?o }. This means ALL matched triples — even those 2-3 hops from the start entity — are emitted with the start entity as subject, producing an incorrect/flattened graph. For depth≥2, triples from intermediate nodes lose their actual subject and get attributed to the root entity.

The tool's purpose is to return the neighborhood subgraph, but consumers will see a star graph rather than the actual topology.

Performance: Double symmetrization quadruples edge weights in Louvain input

📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/CommunityComputation.java:161-162 📄 openmetadata-service/src/main/java/org/openmetadata/service/rdf/insights/Louvain.java:128-129
CommunityComputation.parseGraph (lines 161-162) already symmetrizes the adjacency list by adding weight in both directions for every SPARQL result row. Then Louvain.addAllEdges (lines 128-129) symmetrizes again by doing adj.get(src).merge(dst, w, Double::sum) AND adj.get(dst).merge(src, w, Double::sum). Since the input already has both A→B and B→A entries, the result is each edge getting 4× its original weight.

While this may not change community assignments (the ratio is preserved in the modularity gain formula), it inflates totalWeight and degree[] arrays by 4×, making the absolute modularity score meaningless for comparison with other runs or standard benchmarks. It also doubles memory usage for the adjacency structure.

Quality: 14 fully-qualified class names in validateGraph method

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:203-217
The validateGraph method uses inline fully-qualified names (org.apache.jena.rdf.model.Model, org.apache.jena.riot.RDFDataMgr, java.io.ByteArrayOutputStream, etc.) instead of imports. This violates the project's coding standards ('No fully qualified names') and makes the method hard to read. There are 14 FQN usages in a single 70-line method.

Bug: SparqlFederationGuard race condition on lazy init

📄 openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:113-120
In RdfResource.java lines 113-120, getFederationGuard() has a lazy-init pattern without synchronization. The field is volatile, but the check-then-assign pattern is not atomic — two threads could both see null and both create a new instance. While the consequences here are benign (both would be functionally equivalent), the field is already initialized in initialize() (line 110), so the lazy fallback is only for tests. The inconsistency with the double-checked locking pattern used for getSemanticSearchEngine() is confusing.

...and 1 more resolved from earlier reviews

🤖 Prompt for agents
Code Review: Introduces a robust SPARQL playground, MCP knowledge-graph tools, and advanced insights endpoints while removing deprecated R2RML features. Please address the identified SPARQL UPDATE endpoint bypass vulnerability, which currently allows unauthorized query execution through malformed prefixes.

1. ⚠️ Security: SPARQL UPDATE endpoint bypassable via PREFIX prefix
   Files: openmetadata-service/src/main/java/org/openmetadata/service/resources/rdf/RdfResource.java:1088-1098

   The SPARQL UPDATE endpoint at `/sparql/update` validates the query type by checking if the uppercased trimmed body `startsWith("INSERT")`, etc. This is trivially bypassed with a query that starts with a PREFIX declaration or a comment:
   
   ```
   PREFIX om: <...>
   SELECT * WHERE { ?s ?p ?o }
   ```
   
   A read query prefixed with `PREFIX` or `BASE` would pass neither the INSERT/DELETE/etc check nor be rejected — but more importantly, legitimate UPDATE queries starting with `PREFIX` (which is standard SPARQL practice) will be incorrectly rejected with a 400 error.
   
   The correct approach is to parse the query with Jena's `UpdateFactory` and reject if parsing fails, rather than string-prefix checking.

   Fix:
   // Replace prefix-based validation with proper parsing:
   try {
     org.apache.jena.update.UpdateFactory.create(sparqlQuery.getQuery());
   } catch (Exception e) {
     return Response.status(Response.Status.BAD_REQUEST)
         .entity(buildErrorResponse("Invalid SPARQL UPDATE: " + e.getMessage()))
         .build();
   }
   getRdfRepository().executeSparqlUpdate(sparqlQuery.getQuery());

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

@sonarqubecloud
Copy link
Copy Markdown

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants