Replies: 1 comment 2 replies
-
|
What about adding a new and (initially) optional property
This doesn't smell at all to me ☝️ . If you want to have more than one binding, you need to name them. At the same time, the current behavior of Edit to add: what this is buying, is that users can both use auto-discovery and also customize the path of their single discovered binding if they like. |
Beta Was this translation helpful? Give feedback.
2 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
This started out as just me collecting my thoughts related to some
source-http-ingestchanges to bring it in line with the newresource_pathmerging. But it was tricky to see how to do this without breaking things, and it ended up leading to what I think is a pretty reasonable conclusion aboutresource_paths and discovery merging. So I apologize if this is difficult to follow. Please bear with me.The main goal here was to avoid introducing breaking changes in how connectors handle
resource_paths. Overall, it was very easy to avoid any breaking changes, butsource-http-ingestproved to be kind of an exeption.The current resource config for
source-http-ingestis defined like so:The
pathfield corresponds to the URL path that the server will accept for requests to add data. Thepathis certainly the most reasonable thing to use for theresource_path, but the first-order problem is that it's optional, and not set at all by default. And if it's not specified, then it defaults to the full name of the flow collection that's thetargetof the binding. So for example, a typical binding would look like:and the resulting
resource_pathwould beacmeCo/foo/webhook-data.This means that discover cannot emit an equivalent
resource_path, and thus the next discover operation will remove any existing bindings. Luckily, of the 19 livesource-http-ingestcaptures, all 19 of them have auto-discover disabled, and also have completely default bindings. So, the next discover (which will have been triggered manually via the UI) will remove their existing bindigs. But it seems like any newly discovered binding would be identical to the ones that are already there.Obviously the way we model the
resourceconfig is not ideal here. Minimally, we must be able to return an identicalresource_pathfrom both the discover and validate RPCs, which is impossible given the current behavior ofpath. We do at least have some ability to change how we determine theresource_path, since there's no live captures withautoDiscoverenabled. But we also need to take care not to change the URL paths that are used for existing captures.Ideally, we would allow the 19 existing captures to keep using their same URLs, and also allow them to enable
autoDiscover.Using the
pathas theresource_pathseems like the most reasonable mapping. In other words, the "resource" that the source-http-ingest connector can be said to be the namespace of URL paths.Here's what I think we'd need to do in order to make that work and retain seamless compatibility for all existing tasks:
pathas part of theresourcereturned bydiscoverpaththat's used in returned from discovers would just bewebhook-data, not a full collection namepathto benullin theresourceand retain the currentvalidatebehavior that falls back to using thecollectionas theresource_path.validateto require a non-nullpathonce all existing captures have been updatedpathor the collection nameThat last point is a little gross, but necessary to prevent breaking existing captures if they re-discover and publish.
Left turn?
This is where we intersect with another recent discussion about
resource_paths and the merge behavior related to "custom" bindings. Custom bindings here are defined as any bindings that don't appear in the discover output, as determined by their resource path. Forsource-http-ingest, this effectively means that changing the URL path would make this a "custom" binding, which as things are currently will result in the binding being removed by subsequent discover operations.So what if we were less fussy about the
resource_pathmodeling the namespace of URL paths, and instead just said is any old string identifier? The update tosource-http-ingestcould then be to simply hard-code theresource_pathto some constant, and let thepathcontinue to work as it does tooday and all existing captures are fully compatible.But this raises the question of what happens if you have two capture bindings that both have the same
resource_path. This could technically happen today, but it would be guaranteed to occur if we hard-coded everyresource_pathto the same constant insource-http-ingest.For materializations, this would cause a validation error because we explicitly try to prevent users from materializing two collections into the same database table.
But I would propose that we take a slightly different view of
resource_path, at least for captures. Why not allow multiple capture bindings to have the sameresource_path? To allow it only requires a tiny, but subtle change to the way we implement the discover merge. It would provide a simple solution for how to handle resource paths insource-http-ingest, and would also provide a workable solution for supporting "custom" bindings with discovery in other connectors.The discover merge logic would just need to pull in every live binding that matches each discovered binding, instead of just the first matching binding as it does today. In other words, today we expect and assume that live bindings match 1-1 with discovered bindings, and we choose one arbitrarily if that turns out not to be the case. But if we just allow the merge logic to pull in every matching live binding, then we can gain some useful flexibility. I'm inclined at this point to move forward with allowing multiple capture bindings to have the same
resource_path, but would like to pause and see if anyone else has thoughts on the matter.Beta Was this translation helpful? Give feedback.
All reactions