Skip to content

Billing GraphQL#2883

Open
jshearer wants to merge 8 commits into
masterfrom
jshearer/billing_graphql
Open

Billing GraphQL#2883
jshearer wants to merge 8 commits into
masterfrom
jshearer/billing_graphql

Conversation

@jshearer
Copy link
Copy Markdown
Contributor

@jshearer jshearer commented Apr 23, 2026

This PR moves the billing endpoints from edge functions into GraphQL.

Tenant.billing

Billing is a field on Tenant rather than a standalone top-level query, because tenant is the natural aggregation root and other fields will follow.

type Query {
  tenant(name: String!): Tenant
}

type Tenant {
  name: String!
  billing: TenantBilling!
}

type TenantBilling {
  paymentMethods: [PaymentMethod!]!
  primaryPaymentMethod: PaymentMethod
  invoices(
    filter: InvoiceFilter, 
    before: String, 
    after: String, 
    first: Int, 
    last: Int
  ): InvoiceConnection!
}

This is a design choice I went with as it felt like we were starting to pollute the global namespace with queries that ought to be scoped under their natural owner. It's possible that we should move some other queries as well, but that's out of scope.

WRT permissions: Tenant.billing is non-nullable which means that a request that selects it without Admin on the tenant gets an error, rather than a partial response containing null and a path-scoped entry in errors.

Invoices

Invoice merges two data sources. Database fields (dateStart, dateEnd, invoiceType, subtotal, lineItems, extra) come from the invoices_ext Postgres view. Stripe fields (amountDue, status, invoicePdf, hostedInvoiceUrl) are resolved directly from Stripe.

Mutations

type MutationRoot {
  createBillingSetupIntent(tenant: String!): CreateBillingSetupIntentPayload
  setBillingPaymentMethod(tenant: String!, paymentMethodId: String!): SetBillingPaymentMethodPayload
  deleteBillingPaymentMethod(tenant: String!, paymentMethodId: String!): DeleteBillingPaymentMethodPayload
}
  • createBillingSetupIntent finds or creates a Stripe customer for the tenant, then returns the clientSecret the browser needs for the Stripe Elements flow.
  • setBillingPaymentMethod updates invoice_settings.default_payment_method on the customer.
  • deleteBillingPaymentMethod detaches the method; if the deleted method was the primary, the first remaining method is auto-promoted as default.

All three require Admin on the tenant and validate that the payment method belongs to the tenant's Stripe customer before acting.

billing-types

Shared Stripe types and search-query builders (InvoiceType, InvoiceSearch, InvoiceMetadata, customer_search_query, stripe_search) are extracted into a billing-types crate. billing-integrations now imports these instead of defining its own copies.

CI

deploy-agent-api.yaml injects STRIPE_API_KEY from Cloud Run secret manager. platform-test.yaml conditionally runs the graphql_billing_live_stripe integration test when STRIPE_TESTMODE_API_KEY is available in the environment.

Closes #2879
Part of #2877

@jshearer jshearer self-assigned this Apr 23, 2026
@jshearer jshearer added this to the GraphQL Migration milestone Apr 23, 2026
@jshearer jshearer force-pushed the jshearer/billing_graphql branch 20 times, most recently from c65a777 to 8d1970c Compare April 27, 2026 16:30

/// Fetch invoices older than `cursor` (or the newest invoices when `cursor`
/// is `None`). Returned rows are ordered newest-first.
pub async fn fetch_invoice_rows_forward(
Copy link
Copy Markdown
Contributor Author

@jshearer jshearer Apr 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really tried to not have fetch_invoice_rows_forward and fetch_invoice_rows_backward since they're almost identical, but the alternative (dynamic query generation) was worse, so I kept it. If anyone has any better ideas I'm all ears

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i struggled with this a while ago when working on invite links and i think this duplication is just the price of admission for the query cache. At least it's easy to reason about...

type Value = stripe::Invoice;
type Error = async_graphql::Error;

async fn load(
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be parallelized.. wanted to KISS where possible tho. Happy to be convinced otherwise

Comment thread crates/agent/src/main.rs
/// The port to listen on for API requests.
#[clap(long, default_value = "8080", env = "API_PORT")]
api_port: u16,
/// Stripe secret API key. When provided, the billing GraphQL queries and
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is Option so that a Stripe API key is not required to run locally.

/// App is the wired application state of the control-plane API.
pub struct App {
pub _id_generator: std::sync::Mutex<models::IdGenerator>,
pub billing_provider: Option<Arc<dyn crate::billing::BillingProvider>>,
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really wanted this to be generic, but it would have exploded the complexity of this PR even further and it wasn't worth it

// ensures that things still work correctly without it.
sqlx::query!(r#"delete from role_grants where subject_role = 'estuary_support/';"#)
.execute(&mut *txn)
control_plane_api::test_support::provision_test_tenant(&self.pool, tenant, &email, meta)
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved this to control_plane_api so it could be shared, rather than duplicating it.

@jshearer jshearer force-pushed the jshearer/billing_graphql branch from c00bc9f to cc73e15 Compare April 28, 2026 19:38
@jshearer jshearer marked this pull request as ready for review April 28, 2026 19:42
@jshearer jshearer changed the title WIP Billing GraphQL Billing GraphQL Apr 29, 2026
@jshearer jshearer requested a review from GregorShear April 29, 2026 14:32
@jshearer jshearer force-pushed the jshearer/billing_graphql branch from cc73e15 to 2184f2e Compare April 29, 2026 22:42
@jshearer jshearer requested a review from a team April 30, 2026 17:17
@travjenkins
Copy link
Copy Markdown
Member

travjenkins commented May 1, 2026

Not sure where to put these thoughts - so putting them on the PR.

Should the concept of fetching the payment_provider stay on the tenant or should we expose this through billing? I feel billing is interesting because it overlaps with tenant level settings and even some billing specific alerts/notifications. I have no strong opinion here - so mainly just mentioning it to be mentioned.

The UI will only ever fetch up to 5 tenant's payment methods at once. We do this so support doesn't spam calls. If it is easy (and more importantly makes sense) to allow us to search for an array of tenants that could reduce the number of calls.

To be clear - none of these things are blockers... just thoughts.

@jshearer jshearer force-pushed the jshearer/billing_graphql branch 2 times, most recently from 183cb12 to e3ba37e Compare May 4, 2026 17:35
Comment thread crates/control-plane-api/src/server/public/graphql/billing/tenant.rs Outdated
Comment thread crates/agent/src/main.rs Outdated

/// Fetch invoices older than `cursor` (or the newest invoices when `cursor`
/// is `None`). Returned rows are ordered newest-first.
pub async fn fetch_invoice_rows_forward(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i struggled with this a while ago when working on invite links and i think this duplication is just the price of admission for the query cache. At least it's easy to reason about...

Comment thread mise/tasks/local/control-plane Outdated
Comment thread crates/control-plane-api/src/server/public/graphql/billing/invoices.rs Outdated
Comment thread crates/control-plane-api/src/server/public/graphql/billing/mutations.rs Outdated
Comment thread crates/billing-integrations/src/publish.rs
Comment thread crates/billing-types/src/lib.rs
Comment thread crates/billing-integrations/src/publish.rs Outdated
Comment thread .github/workflows/platform-test.yaml Outdated
@jshearer jshearer force-pushed the jshearer/billing_graphql branch from e3ba37e to a8dea5d Compare May 6, 2026 16:17
jshearer added 2 commits May 6, 2026 14:54
Introduce a `BillingProvider` abstraction over the Stripe operations we need for customer, payment-method, invoice, and setup-intent handling, together with a Stripe-backed implementation and an in-memory test mock. The trait is deliberately scoped to outbound Stripe API calls so integration tests can stub them.

* `BillingProvider` trait: Stripe primitives plus composed default methods (`find_customer`, `require_customer`, `find_or_create_customer`, `fetch_invoice`).
* `StripeBillingProvider`: production implementation over `stripe::Client`.
* `InMemoryBillingProvider`: stateful test mock used by integration tests and local agent startup.
* `billing::db::fetch_invoice_rows`: typed read over the `invoices_ext` view with date-range and invoice-type filters.
@jshearer jshearer force-pushed the jshearer/billing_graphql branch 3 times, most recently from dbfe8d5 to ca57448 Compare May 6, 2026 19:34
jshearer added 5 commits May 6, 2026 16:04
…L surface

Add `BillingProvider` (optional) to the `App` struct and wire `--stripe-api-key` from CLI args. When absent, billing operations return "Billing is not configured". Add billing GraphQL mutations (`createBillingSetupIntent`, `setBillingPaymentMethod`, `deleteBillingPaymentMethod`) and `Tenant.billing` query with invoices, payment methods, and customer data. DataLoaders (`StripeInvoiceLoader`, `ChargeDataLoader`, `CustomerDataLoader`) are injected per-request when a billing provider is configured.
The agent refuses to start without either `STRIPE_API_KEY` or `BILLING_IN_MEMORY=true`, so every environment that starts the agent now has to supply one.

* `deploy-agent-api.yaml`: inject `STRIPE_API_KEY` from Cloud Run secret manager so the deployed agent starts.
* `platform-test.yaml`: conditionally run `graphql_billing_live_stripe` against `STRIPE_TESTMODE_API_KEY` when the secret is present.
* `mise/tasks/local/control-plane`: forward a shell-provided `STRIPE_API_KEY` into `agent.env`, or fall back to `BILLING_IN_MEMORY=true` so local dev works without additional setup.
Instead of building queries ad-hoc, let's centralize it a bit
* `CustomerDataLoader` resolves tenant names to Stripe customers. All customer lookups across `payment_methods`, `primary_payment_method`, and invoice resolution share this single DataLoader per request.
* `StripeInvoiceLoader` resolves invoices by customer ID, period, and type. Its keys now carry a `CustomerId` rather than a tenant name, so it has no customer-lookup logic of its own. Searches are parallelized via `join_all`.
* `ChargeDataLoader` resolves charges by `PaymentIntentId`, only hit when `paymentDetails` is selected.

Also moves `receipt_url` into `InvoicePaymentDetails` (both fields come from the charge) and adds a `ChargeStatus` enum so consumers can distinguish payment outcomes. Removes the unused `BillingProvider::fetch_invoice` default method.
@jshearer jshearer force-pushed the jshearer/billing_graphql branch from ca57448 to 98a8373 Compare May 6, 2026 20:06
@jshearer jshearer requested a review from GregorShear May 6, 2026 22:56
…cy-Key

`find_or_create_customer` and `get_or_create_customer_for_tenant` search Stripe by tenant metadata then create if the search misses, but `customers.search` is eventually consistent so two near-simultaneous calls can both miss and both create a duplicate customer row for the same tenant.
Use a deterministic `Idempotency-Key` per tenant on `Customer::create` so concurrent or retried creations collapse inside Stripe's 24h window
Comment on lines +45 to +55
for tenant in keys {
if let Some(customer) = self
.0
.find_customer(tenant)
.await
.map_err(|err| async_graphql::Error::new(err.to_string()))?
{
out.insert(tenant.clone(), customer);
}
}
Ok(out)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same parallelization treatment instead of sequential? maybe for the sake of consistency with the other loaders more than a real performance concern

Comment on lines +154 to +157
inv.status
.as_ref()
.and_then(|s| serde_json::to_value(s).ok())
.and_then(|v| v.as_str().map(str::to_string))
Copy link
Copy Markdown
Contributor

@GregorShear GregorShear May 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InvoiceStatus has an as_str() func

Suggested change
inv.status
.as_ref()
.and_then(|s| serde_json::to_value(s).ok())
.and_then(|v| v.as_str().map(str::to_string))
inv.status.as_ref().map(|s| s.as_str().to_string())

Copy link
Copy Markdown
Contributor

@GregorShear GregorShear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to leave a couple of raw agent comments - i'm rushing out the door and wanted to get this up before i leave

Comment on lines +36 to +43
pub fn from_str(s: &str) -> Option<Self> {
match s {
"final" => Some(InvoiceType::Final),
"preview" => Some(InvoiceType::Preview),
"manual" => Some(InvoiceType::Manual),
_ => None,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent says:

this shadows std::str::FromStr::from_str, so "final".parse::() won't work and the method signature is confusingly close to the trait's. Could either impl FromStr for InvoiceType (which gives you .parse() for free) or rename the inherent method to something like parse_str to avoid the collision.


/// Fetch invoices older than `cursor` (or the newest invoices when `cursor`
/// is `None`). Returned rows are ordered newest-first.
pub async fn fetch_invoice_rows_forward(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, do you mind mentioning in your comment that this is forward pagination? there's another gql api where pagination and ordering of the result set is conflated (and wrong) and i want to make sure we give the agent a chance to get it right if it follows this example

}

/// Fetch invoices newer than `cursor`. Returned rows are ordered newest-first.
pub async fn fetch_invoice_rows_backward(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here - add comment this is backward pagination

let Some(primary_id) = billing::default_payment_method_id(&customer) else {
return Ok(None);
};
let pm = self
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent says:

If the client selects both paymentMethods and primaryPaymentMethod in the same query (which the UI mocks suggest is the typical shape), this fetches primary_id via get_payment_method even though that PM is already in the list returned by list_payment_methods for the same customer. Worth either:

  • Looking the primary up in the cached list, or
  • Adding a PaymentMethodLoader keyed on (customer_id, payment_method_id) with a default method that batches via list_payment_methods.

The simpler "find in the list" approach would mirror what set_billing_payment_method already does in the mutations file. Not a hot path so it's fine to defer, but flagging.


async fn stripe_invoice(&self, ctx: &Context<'_>) -> Result<Option<stripe::Invoice>> {
let customer_loader = ctx.data::<DataLoader<super::tenant::CustomerDataLoader>>()?;
let Some(customer) = customer_loader.load_one(self.tenant.clone()).await? else {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agent says:

Worth confirming the cache hits when both paymentDetails and TenantBilling.payment_methods are selected in the same query — the latter passes self.tenant from validate_tenant_name (a Prefix-validated string), and this one passes row.billed_prefix from the DB. If the two ever disagree on trailing-slash or case normalization, the per-request DataLoader silently issues two find_customer calls instead of one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Migrate billing edge functions to GraphQL

3 participants