Skip to content

Llm usage records#4

Open
ajaycj wants to merge 1 commit into
salesforce-misc:mainfrom
ajaycj:llm-usage-records
Open

Llm usage records#4
ajaycj wants to merge 1 commit into
salesforce-misc:mainfrom
ajaycj:llm-usage-records

Conversation

@ajaycj
Copy link
Copy Markdown

@ajaycj ajaycj commented Apr 27, 2026

Whats in the PR?
a switchplane.usage module will normalize token counts/cost estimates, AgentContext will emit llm.usage, and the devops graph will record the one LLM node plus estimated deterministic savings. I’m going to make those focused edits and add tests for the new usage helper and event emission.

@salesforce-cla
Copy link
Copy Markdown

Thanks for the contribution! Unfortunately we can't verify the commit author(s): Ajay Chinthalapalli Jayakumar <a***@s***.com>. One possible solution is to add that email to your GitHub account. Alternatively you can change your commits to another email and force push the change. After getting your commits associated with your GitHub account, refresh the status of this Pull Request.

@ajaycj ajaycj force-pushed the llm-usage-records branch from 818d0ba to 0c6bd87 Compare April 27, 2026 23:04
@ajaycj ajaycj closed this Apr 27, 2026
@ajaycj ajaycj reopened this Apr 27, 2026
@ajaycj ajaycj force-pushed the llm-usage-records branch from 5a0da28 to 92738e1 Compare April 27, 2026 23:09
Copy link
Copy Markdown
Contributor

@demianbrecht demianbrecht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See inline comments. The main concern is that LangChain already surfaces token usage on every AIMessage via usage_metadata — this PR re-implements that extraction manually and adds significant ceremony to task code. A LangGraph callback approach would achieve the same result transparently.

Comment thread src/switchplane/usage.py
Comment on lines +89 to +113
"""Extract provider-reported token counts from common LangChain responses."""

usage_metadata = getattr(response, "usage_metadata", None)
if isinstance(usage_metadata, dict):
prompt = _coerce_int(usage_metadata.get("input_tokens") or usage_metadata.get("prompt_tokens"))
completion = _coerce_int(
usage_metadata.get("output_tokens")
or usage_metadata.get("completion_tokens")
or usage_metadata.get("generated_tokens")
)
total = _coerce_int(usage_metadata.get("total_tokens"))
if prompt is not None or completion is not None or total is not None:
return prompt, completion, total

response_metadata = getattr(response, "response_metadata", None)
if isinstance(response_metadata, dict):
token_usage = response_metadata.get("token_usage") or response_metadata.get("usage")
if isinstance(token_usage, dict):
prompt = _coerce_int(token_usage.get("prompt_tokens") or token_usage.get("input_tokens"))
completion = _coerce_int(token_usage.get("completion_tokens") or token_usage.get("output_tokens"))
total = _coerce_int(token_usage.get("total_tokens"))
if prompt is not None or completion is not None or total is not None:
return prompt, completion, total

return None, None, None
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LangChain already surfaces token usage on every AIMessage via usage_metadata (input_tokens, output_tokens, total_tokens). This function is manually re-extracting data that the framework already provides — it doesn't add new information.

A LangGraph callback (on_llm_end) could capture this automatically without any custom extraction logic.

Comment thread src/switchplane/usage.py
Comment on lines +22 to +33
"claude-sonnet-4-20250514": ModelPricing(3.0, 15.0),
"claude-sonnet-4-5-20250929": ModelPricing(3.0, 15.0),
"claude-sonnet-4-6": ModelPricing(3.0, 15.0),
"claude-opus-4-20250514": ModelPricing(15.0, 75.0),
"claude-opus-4-6-v1": ModelPricing(15.0, 75.0),
"claude-haiku-4-5-20251001": ModelPricing(1.0, 5.0),
"gpt-4o": ModelPricing(2.5, 10.0),
"gpt-4o-mini": ModelPricing(0.15, 0.60),
"gemini-2.0-flash": ModelPricing(0.10, 0.40),
"gemini-2.5-flash": ModelPricing(0.30, 2.50),
"gemini-2.5-pro": ModelPricing(1.25, 10.0),
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pricing table will go stale immediately. Prices change frequently, model IDs are versioned, and this doesn't account for caching discounts, batch API pricing, prompt caching writebacks, etc.

If cost tracking is needed, it belongs in config or an external source — not a compiled-in dict that requires code changes to update.

Comment thread src/switchplane/usage.py
Comment on lines +68 to +76
"""Estimate USD cost for a model if pricing is known."""

pricing = MODEL_PRICING.get(model)
if pricing is None:
return None
cost = (prompt_tokens / 1_000_000 * pricing.input_per_million) + (
completion_tokens / 1_000_000 * pricing.output_per_million
)
return round(cost, 6)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This len(text) / 4 heuristic is very rough, and the results get stored in the same LLMUsageRecord alongside provider-reported actuals. Downstream consumers of these records have no reliable way to distinguish precision levels.

The estimated_tokens_saved metric (raw prompt tokens from this estimate minus actual prompt tokens from the provider) is comparing a pre-call guess against a post-call actual — not a meaningful comparison.

Comment on lines +380 to 413
usage = llm_usage_from_response(
response,
task_id=ctx.task_id,
model=model,
node_name="summarize",
fallback_prompt_text=f"{_SYSTEM_PROMPT}\n\n{prompt}",
fallback_completion_text=str(response.content),
estimated_raw_prompt_tokens=state["estimated_raw_prompt_tokens"],
metadata={
"deterministic_nodes": 3,
"llm_nodes": 1,
"rows_processed": state["rows_processed"],
"formatted_prompt_tokens_estimate": estimate_text_tokens(prompt),
},
)
ctx.record_llm_usage(
model=usage.model,
node_name=usage.node_name,
prompt_tokens=usage.prompt_tokens,
completion_tokens=usage.completion_tokens,
total_tokens=usage.total_tokens,
estimated_cost_usd=usage.estimated_cost_usd,
estimated_raw_prompt_tokens=usage.estimated_raw_prompt_tokens,
estimated_tokens_saved=usage.estimated_tokens_saved,
metadata=usage.metadata,
)
ctx.progress(
"LLM usage recorded",
prompt_tokens=usage.prompt_tokens,
completion_tokens=usage.completion_tokens,
total_tokens=usage.total_tokens,
estimated_cost_usd=usage.estimated_cost_usd,
estimated_tokens_saved=usage.estimated_tokens_saved,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core problem with the approach — the summarize node went from ~5 lines to 30+ lines of usage-tracking ceremony. Every LLM node in every task would need this same boilerplate.

If usage tracking is a framework concern, it should be transparent. A LangGraph callback on on_llm_end could emit the llm.usage event automatically with zero changes to task code.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants