Add the Multi-Tenant Catalogs Endpoint Extension, for nested catalog support#366
Add the Multi-Tenant Catalogs Endpoint Extension, for nested catalog support#366jonhealy1 wants to merge 68 commits into
Conversation
|
It's going to be best to fix the extension so it supports python 3.11 |
…jonhealy1/stac-fastapi-pgstac into stac-fastapi-catalogs-extension
|
This is really close to being reviewable, just need some time to do some qa, documentation. |
@bkanuka Nice catch - I think this should be fixed now. |
|
I don't think we should make any changes to We should make it clear to users of the Catalogs Extension that the catalog hierarchical relationships are inherently fragile because they are not managed by pgstac at all - parent/child relationships are not stored in proper columns or tracked in a junction table or anything like that. The transactions routes in the extension provide a way to manage these relationships more safely but it would still be easy for a user to POST a collection There might be a way to do this more safely using pgstac's I had an LLM spec out what the changes would look like and it is not a simple swap. We could probably do it without pgstac changes but there are some small pgstac-side tweaks that could make it easier. Expand below for that response: parent_ids -> private spec
What changes immediately if
|
This is interesting, but sounds like something we would want to open an issue for and discuss after this first version is merged. Having a private, protected column for parent_ids is a very good idea potentially. It would take some coordination with the pgstac devs for one thing. |
FWIW I think the |
I think that the change would be backwards compatible from a User's perspective; you can still post a collection with the parent_ids list, but instead of the list being stored in jsonb, it would be stored in our new private database column. Someone using stac-fastapi-pgstac wouldn't know anything has changed internally. |
|
The idea for the private column in collections and items (which I don't think anyone has ever actually done anything with) was to be a place that a user could store things that were never to be directly user accessible. A couple use cases behind the thought were 1) to be able to store metadata that could get used by Row Based Authorization tooling within Postgres 2) to be able to store metadata that could be used for things like data synchronization like a timestamp of when the data was last updated in that instance of pgstac (which is different than when the row was actually updated) or similar things. At least in my head, if you can search off something, it is inherently "leaky" and not private anymore to a user. If you want to search by parent_ids, but you don't want them to show up in the results, the "staccy" way to do that would be to use the fields extension excluding that column. I still lean towards the stance that this should be a PATCH operation and not a PUT/POST - at least for now. If/when this extension gets marked as stable in the stac api spec, I do think that we could/should do some things in pgstac to actually instantiate this as a real column and really optimize it, but I am leery to add churn to the pgstac schema right now (which changing the schema - less so for collections than items, but still... can be a big deal as it often requires an entire table rewrite). I still think that if parent_ids is important to the collection (whether in the content json, the private field, or instantiated as an actual column), that that should be something that is managed wherever you are defining your collection and that it should not be the storage engine's responsibility to selectively merge things while ignoring that field. If you have out-of-band updates to your collection content, that seems like you have an open door for plenty of other issues. |
|
@bitner I completely agree. The app layer should carry the responsibility of maintaining that state rather than adding complexity or schema churn to pgstac. To make this safe for users, the right path forward is ensuring they update via scoped transaction routes - such as PUT The scoped collection update route isn't currently in the extension spec, so I'll add that endpoint to the extension to close this loophole before we wrap this up. |
Related Issue(s):
Description:
Extension spec: https://github.com/StacLabs/multi-tenant-catalogs
STAC-FastAPI catalogs extension: https://github.com/StacLabs/stac-fastapi-catalogs-extension
PR Checklist:
pre-commithooks pass locallymake test)make docs)