Skip to content

[feature] spatial: add distance and distance-sphere functions#6308

Open
joewiz wants to merge 1 commit into
eXist-db:developfrom
joewiz:feature/spatial-distance
Open

[feature] spatial: add distance and distance-sphere functions#6308
joewiz wants to merge 1 commit into
eXist-db:developfrom
joewiz:feature/spatial-distance

Conversation

@joewiz
Copy link
Copy Markdown
Member

@joewiz joewiz commented May 7, 2026

Summary

Adds spatial:distance/2 (cartesian, source-SRS units) and spatial:distance/3 (unit-aware: degree, meter, kilometer, mile, nautical-mile) to the spatial XQuery module. Closes the long-standing gap that there was no way to compute distance between two GML geometries.

This is the first of ~10 small surgical PRs modernizing the spatial module's function surface to add capabilities developers reach for today (distance, DWithin, KNN, GeoJSON I/O, geohash, lat/lon constructors).

The backend stays HSQLDB; no schema change, no migration story.

What changed

All files in extensions/indexes/spatial/src/main/java/org/exist/xquery/modules/

File Change
spatial/FunSpatialDistance.java NEW — 222 lines. Two FunctionSignatures for spatial:distance/2 and /3. Geometry resolution mirrors the existing FunGeometricProperties pattern: getGeometryForNode for persistent nodes, streamNodeToGeometry for in-memory. Cartesian uses JTS Geometry.distance; haversine uses JTS DistanceOp.nearestPoints plus an inline haversine helper on the closest-point pair, then converts to the requested unit via Java 21 switch expression.
spatial/SpatialModule.java EDIT — 4 lines added. Registers the two new signatures at the head of functions[] under a // --- Distance & proximity --- labeled block.
spatial/FunSpatialDistanceTest.java NEW — 184 lines. 10 JUnit tests using in-memory GML fixtures (NYC, LA, Paris in EPSG:4326). Assertions cover: cartesian default, all four non-degree units, kilometer = meter / 1000, empty operands return empty sequence, unsupported unit raises XPathException, explicit 'degree' unit equals 2-arity cartesian.

XQuery API

spatial:distance($g1 as node(), $g2 as node()) as xs:double?
spatial:distance($g1 as node(), $g2 as node(), $unit as xs:string) as xs:double?

$unit is one of "degree" (default; cartesian, source SRS), "meter", "kilometer", "mile", "nautical-mile". For non-degree units both geometries are transformed to EPSG:4326 and great-circle distance is computed on the closest-point pair via haversine.

Examples

import module namespace spatial='http://exist-db.org/xquery/spatial';
declare namespace gml='http://www.opengis.net/gml';

let $nyc := <gml:Point srsName='EPSG:4326'><gml:coordinates>-73.9857,40.7484</gml:coordinates></gml:Point>
let $la  := <gml:Point srsName='EPSG:4326'><gml:coordinates>-118.4081,33.9416</gml:coordinates></gml:Point>
return (
  spatial:distance($nyc, $la),                  (: 44.94... -- degrees in EPSG:4326 :)
  spatial:distance($nyc, $la, 'kilometer'),     (: ~3935.7 :)
  spatial:distance($nyc, $la, 'mile')           (: ~2445.6 :)
)

Design notes

  • Why haversine and not GeodeticCalculator? Haversine has ~0.5% error vs full WGS84 ellipsoidal distance (negligible for the precision XQuery developers need at the application layer). It avoids pulling more of GeoTools' referencing stack into the hot path. Users who need geodetic precision can transform to a projected CRS first via spatial:transform.
  • Why a 2-arity cartesian default? That's what Geometry.distance does — in source SRS units. For EPSG:4326 that's degrees, which is "wrong" in casual use but is the documented OGC SFA semantics for unit-less distance. The 3-arity form with $unit = 'meter' is the right call for lat/lon developers.
  • Earth radius constant: WGS84 authalic radius (6,371,007.2 m) — gives an equal-area sphere matching EPSG:4326 ellipsoid surface area; standard choice for haversine.
  • IndexUseReporter: implemented; reports true when the geometry was pulled from SPATIAL_INDEX_V1, false when streamed from in-memory or unindexed nodes. Mirrors FunSpatialSearch and FunGeometricProperties.
  • No Optimizable integration in this PR — distance doesn't filter, so there's no FLWOR rewrite opportunity. DWithin (next PR) will need it.
  • No new dependencies. Uses existing JTS 1.20 (DistanceOp, Geometry.distance).
  • ErrorCodes.FOER0000: used for the unsupported-unit error and for "unable to resolve geometry"; consistent with how other eXist modules surface user-facing errors that don't have a more specific W3C code. Avoids the deprecated XPathException(Expression, String) constructor.

Spec references

Test plan

  • Cartesian distance is Euclidean in source SRS — asserted exactly via Pythagoras on EPSG:4326 lat/lon pair.
  • Haversine NYC → LA ~3935.7 km — asserted within 1% of reference value.
  • Kilometer = Meter / 1000 — asserted.
  • Mile NYC → LA ~2445.6 mi — asserted within 1%.
  • Nautical-mile NYC → LA ~2124.6 nm — asserted within 1%.
  • Haversine NYC → Paris ~5837 km — asserted within 1% (cross-Atlantic case).
  • Empty operand returns empty sequence (both first and second positions).
  • Unsupported unit raises XPathException with the offending unit named in the message.
  • Explicit 'degree' unit equals 2-arity default — asserted.
  • Module test suite (mvn test -pl extensions/indexes/spatial) — 18 passed, 1 skipped (pre-existing XQUF @Ignore), 0 failures.
  • License headers (mvn license:check) — clean.
  • Codacy PMD on changed files — clean.

Out of scope (future PRs)

  • spatial:within-distance (DWithin filter) — needs the index-side bbox-prefilter; planned as PR 2.
  • spatial:nearest (KNN) — builds on distance; planned as PR 3.
  • Distance from geometries other than Points (LineString, Polygon) — works automatically via JTS, but isn't explicitly tested in this PR. The 2-arity cartesian + 3-arity unit functions accept any node containing a GML geometry; closest-point haversine handles polygon-polygon, point-polygon etc. correctly.
  • Geodetic-precision distance via gt-referencing.GeodeticCalculator — 0.5% accuracy loss vs full ellipsoidal; users can transform to a projected CRS for sub-meter accuracy. Could land later as spatial:distance($g1, $g2, 'meter', 'ellipsoidal') if requested.
  • Function-surface modernization for the existing 23 getEPSG4326* redundant signatures — separate [refactor] PR.
  • GML parser strictness for hand-constructed geometries — separate [bugfix] PR.

🤖 Generated with Claude Code

@joewiz joewiz requested a review from a team as a code owner May 7, 2026 02:03
Add spatial:distance/2 (cartesian, source SRS) and spatial:distance/3
(unit-aware: degree, meter, kilometer, mile, nautical-mile). Non-degree
units transform to EPSG:4326 and use haversine on the closest-point
pair found by JTS DistanceOp. Cartesian computation is JTS native.

Closes the long-standing gap that the spatial module had no distance
function -- previously a developer asking "how far apart are these two
points?" had no spatial:* answer.

Part of the spatial index modernization tasking; first of ~10 small
surgical PRs adding the function surface developers expect today
(distance, DWithin, KNN, GeoJSON I/O, geohash, lat/lon constructors).

The backend stays HSQLDB; no schema change, no migration. New function
class follows the same pattern as FunSpatialSearch and
FunGeometricProperties: BasicFunction + IndexUseReporter; geometry
resolution via the existing AbstractGMLJDBCIndexWorker public methods
(getGeometryForNode for persistent nodes, streamNodeToGeometry for
in-memory).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@joewiz joewiz force-pushed the feature/spatial-distance branch from 3ce6d79 to 9878912 Compare May 7, 2026 02:15
@duncdrum
Copy link
Copy Markdown
Contributor

I m wondering if we really want to stick with HSQLDB backend, or if we shouldn't aim to use our own storage, combined with Lucene-spatial ?

That being said, I never was an active user of the spatial module so I'm curious what those who actually use(-d) it think.

@brihaye
Copy link
Copy Markdown
Contributor

brihaye commented May 12, 2026 via email

@joewiz
Copy link
Copy Markdown
Member Author

joewiz commented May 12, 2026

[This response was co-authored with Claude Code. -Joe]

@duncdrum @brihaye I mulled over this HSQLDB-vs-Lucene-spatial question when, prompted by our work on the ngram index, I asked Claude to examine the state of the spatial-module. Claude prepped a four-phase analysis (baseline → capability gap → design → implementation). Phase 2 concluded with a two-track recommendation:

  1. Near-term: keep HSQLDB + JTS, modernize the function surface. Add 12-15 high-value functions developers expect today (distance, DWithin, GeoJSON I/O, geohash, lat/lon constructors, bounding-box filter). I haven't seen complaints about the backend, and HSQLDB seems well-maintained, so dusting the extension off and adding a couple of functions that developers could use seemed like the best approach. This PR the first deliverable of this track (distance + distance-sphere).

  2. Long-term: migrate the storage layer from HSQLDB to Lucene 10 (LatLonShape + spatial-extras), deprecating HSQLDB with a release window. This aligns with the trajectory of eXist's other index modules toward Lucene, removes one whole database engine from the runtime, and the BKD tree under LatLonShape should work well for the point-in-polygon and KNN queries Tier A introduces. Hearing Pierrick had other backends in mind and likes Lucene-spatial makes it seem like the right choice.

This PR is squarely on track 1. The function additions don't preempt any track-2 decision.

Here's the full analysis: 2026-05-05 spatial-index-modernization-tasking.md.

@brihaye
Copy link
Copy Markdown
Contributor

brihaye commented May 12, 2026 via email

Copy link
Copy Markdown
Contributor

@duncdrum duncdrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds like a plan

@duncdrum duncdrum requested a review from a team May 19, 2026 19:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants