Background
The current design treats vector data as ephemeral input — a GeoJSON FeatureCollection passed as a parameter to aggregate_spatial. This works for the zonal statistics use case but misses broader opportunities: storing and serving standard boundary datasets, producing rich mappable output, and enabling vector-to-vector operations.
openEO's vector datacube model offers a clean pattern to follow.
The vector datacube model
openEO treats vector as a first-class datacube type alongside raster. The output of aggregate_spatial is a vector datacube — an xr.DataArray with a geometry dimension backed by shapely geometries via the xvec library:
Raster datacube → aggregate_spatial(geometries) → Vector datacube
(time, y, x) (time, geometry)
geometry coords = shapely polygons
The geometry (polygon shapes, feature properties) travels through the pipeline and ends up in the output — the result is not a dead-end table but a datacube that can be further processed, filtered, and saved in standard GIS formats.
Proposed changes
1. GeoParquet output instead of JSON
save_result(format="GeoParquet") is the openEO standard for vector output and is strictly better than plain JSON for zonal statistics results:
- Columnar storage — efficient for large zone × timestep results
- Preserves geometry — output is directly mappable in QGIS, Python, R without joining to a separate boundaries file
- Natively supported by GDAL, geopandas, DuckDB, QGIS, ArcGIS
2. Vector datasets as named collections
Standard boundary datasets (administrative regions, watersheds, river basins) stored as GeoParquet files and registered as STAC collections — loadable via a load_vector_cube process:
# Client supplies geometry
aggregate_spatial(load_collection("era5land_temp"), geometries=geojson_fc)
# Client references a named collection
aggregate_spatial(load_collection("era5land_temp"), geometries=load_vector_cube("admin_boundaries_lk"))
This removes the need for clients to supply boundaries on every request and allows service deployers to ship standard boundaries alongside dataset plugins.
3. Serving vector data — GeoParquet for analysis, PMTiles for the browser
These are two different access patterns requiring different formats:
GeoParquet over HTTP — for analysis clients (DuckDB, PyArrow, geopandas, QGIS). Parquet stores bounding box statistics per row group in the file footer. A client reads the footer first (~8 KB), identifies which row groups overlap the area of interest, then fetches only those via HTTP range requests — the same mechanism the service already uses for Zarr. The file is served statically with no server-side processing.
Client reads footer → checks row group bboxes → fetches only matching chunks
(HTTP range requests)
PMTiles — for browser map rendering. Browsers have no native Parquet support, and WebAssembly Parquet readers are too coarse for viewport-based rendering (row groups are 100K–1M features, with no zoom awareness). PMTiles organises data by tile (zoom + x + y) so the browser fetches only tiles for the current viewport at the current zoom level. MapLibre GL JS supports PMTiles natively.
PMTiles for a vector collection is generated from GeoParquet via tippecanoe and served as a static file alongside it — both registered as assets in the STAC collection.
OGC API - Features — pygeoapi (already in the project) serves vector collections for GIS client access (QGIS, ArcGIS, Leaflet) independently of the openEO jobs flow.
Full data flow
Vector input: GeoJSON FeatureCollection (client-supplied)
OR load_vector_cube("named_collection") (service-stored GeoParquet)
↓
aggregate_spatial (openeo-processes-dask, xvec internally)
↓
Vector output: save_result(format="GeoParquet")
↓
GET /jobs/{id}/results → GeoParquet asset (analysis: DuckDB, Python, QGIS)
→ PMTiles asset (browser: MapLibre GL JS)
GET /collections/{id}/items → OGC API Features (pygeoapi)
What needs to be built
| Component |
Notes |
GeoParquet save_result |
Detect vector datacube output, write GeoParquet via geopandas |
| PMTiles generation |
Run tippecanoe on GeoParquet output; register as assets.pmtiles in STAC |
| Vector collection storage |
GeoParquet + PMTiles in vector/ directory, registered as STAC collections |
load_vector_cube process |
Processing plugin — loads a named vector collection by ID |
| OGC API Features serving |
Wire pygeoapi to serve vector STAC collections |
Relevant openEO processes
The following processes from the openEO spec apply to vector data — most are implemented in openeo-processes-dask:
| Process |
Description |
Implemented |
load_geojson |
GeoJSON → vector datacube |
✅ |
aggregate_spatial |
Raster + geometries → vector datacube |
✅ |
vector_buffer |
Buffer geometries by distance |
✅ |
vector_reproject |
Reproject geometry dimension |
✅ |
filter_vector |
Filter vector datacube by properties |
❌ |
load_vector_cube |
Load stored vector dataset |
❌ (backend-provided) |
vector_to_random_points |
Sample random points from polygons |
❌ |
vector_to_regular_points |
Sample regular point grid from polygons |
❌ |
References
Background
The current design treats vector data as ephemeral input — a GeoJSON FeatureCollection passed as a parameter to
aggregate_spatial. This works for the zonal statistics use case but misses broader opportunities: storing and serving standard boundary datasets, producing rich mappable output, and enabling vector-to-vector operations.openEO's vector datacube model offers a clean pattern to follow.
The vector datacube model
openEO treats vector as a first-class datacube type alongside raster. The output of
aggregate_spatialis a vector datacube — anxr.DataArraywith ageometrydimension backed by shapely geometries via thexveclibrary:The geometry (polygon shapes, feature properties) travels through the pipeline and ends up in the output — the result is not a dead-end table but a datacube that can be further processed, filtered, and saved in standard GIS formats.
Proposed changes
1. GeoParquet output instead of JSON
save_result(format="GeoParquet")is the openEO standard for vector output and is strictly better than plain JSON for zonal statistics results:2. Vector datasets as named collections
Standard boundary datasets (administrative regions, watersheds, river basins) stored as GeoParquet files and registered as STAC collections — loadable via a
load_vector_cubeprocess:This removes the need for clients to supply boundaries on every request and allows service deployers to ship standard boundaries alongside dataset plugins.
3. Serving vector data — GeoParquet for analysis, PMTiles for the browser
These are two different access patterns requiring different formats:
GeoParquet over HTTP — for analysis clients (DuckDB, PyArrow, geopandas, QGIS). Parquet stores bounding box statistics per row group in the file footer. A client reads the footer first (~8 KB), identifies which row groups overlap the area of interest, then fetches only those via HTTP range requests — the same mechanism the service already uses for Zarr. The file is served statically with no server-side processing.
PMTiles — for browser map rendering. Browsers have no native Parquet support, and WebAssembly Parquet readers are too coarse for viewport-based rendering (row groups are 100K–1M features, with no zoom awareness). PMTiles organises data by tile (zoom + x + y) so the browser fetches only tiles for the current viewport at the current zoom level. MapLibre GL JS supports PMTiles natively.
PMTiles for a vector collection is generated from GeoParquet via tippecanoe and served as a static file alongside it — both registered as assets in the STAC collection.
OGC API - Features — pygeoapi (already in the project) serves vector collections for GIS client access (QGIS, ArcGIS, Leaflet) independently of the openEO jobs flow.
Full data flow
What needs to be built
save_resultassets.pmtilesin STACvector/directory, registered as STAC collectionsload_vector_cubeprocessRelevant openEO processes
The following processes from the openEO spec apply to vector data — most are implemented in
openeo-processes-dask:load_geojsonaggregate_spatialvector_buffervector_reprojectfilter_vectorload_vector_cubevector_to_random_pointsvector_to_regular_pointsReferences