You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Proposition.confidence reflects the LLM's self-assessed certainty during extraction. But confidence alone is insufficient for trust decisions:
A high-confidence extraction from an untrusted source may need skepticism
A high-confidence proposition that contradicts well-established knowledge may need review
A proposition corroborated by multiple independent sources (grounding, sourceIds) is more trustworthy than one from a single source
DICE extracts and stores propositions with confidence, importance, grounding, sourceIds, and metadata — but nothing between extraction and persistence evaluates whether a proposition should be trusted. Any post-extraction quality gating is entirely on the consumer.
What DICE already has
confidence: ZeroToOne on Proposition — LLM self-assessment, set at extraction time
grounding: List<String> — chunk IDs supporting the proposition (corroboration signal)
sourceIds: List<String> — for abstracted propositions, the IDs of source propositions
metadata: Map<String, Any> — could carry source authority signals
PropositionPipeline — extract → revise → persist, but no trust gate between revise and persist
The question
Should DICE have a trust evaluation layer between extraction and persistence?
Some possibilities:
Metadata convention — extraction prompts populate metadata["trustSignals"] with source/corroboration hints. Consumers interpret them. No code changes.
TrustSignal SPI — a fun interface TrustSignal { fun evaluate(proposition: Proposition): Double } that scores propositions on a 0-1 scale. Built-in signals could consume fields DICE already produces:
Extraction confidence passthrough
Corroboration count (how many distinct sources in grounding)
TrustEvaluator in the pipeline — PropositionPipeline accepts an optional evaluator that scores propositions before persistence. Propositions below a threshold are dropped or routed to a review status.
Trust as a Proposition field — add a trustScore: Double? field alongside confidence. Confidence = "how certain is the extraction," trust = "how much should we believe it given external signals." Orthogonal dimensions.
The cheap signals (confidence, corroboration, source authority) could filter a meaningful percentage of low-quality extractions before any expensive checks run.
Observation
Proposition.confidencereflects the LLM's self-assessed certainty during extraction. But confidence alone is insufficient for trust decisions:grounding,sourceIds) is more trustworthy than one from a single sourceDICE extracts and stores propositions with
confidence,importance,grounding,sourceIds, andmetadata— but nothing between extraction and persistence evaluates whether a proposition should be trusted. Any post-extraction quality gating is entirely on the consumer.What DICE already has
confidence: ZeroToOneonProposition— LLM self-assessment, set at extraction timegrounding: List<String>— chunk IDs supporting the proposition (corroboration signal)sourceIds: List<String>— for abstracted propositions, the IDs of source propositionsmetadata: Map<String, Any>— could carry source authority signalsPropositionPipeline— extract → revise → persist, but no trust gate between revise and persistThe question
Should DICE have a trust evaluation layer between extraction and persistence?
Some possibilities:
Metadata convention — extraction prompts populate
metadata["trustSignals"]with source/corroboration hints. Consumers interpret them. No code changes.TrustSignal SPI — a
fun interface TrustSignal { fun evaluate(proposition: Proposition): Double }that scores propositions on a 0-1 scale. Built-in signals could consume fields DICE already produces:grounding)metadata, if provenance tracking (Multi-agent proposition governance #17) is adopted)TrustEvaluator in the pipeline —
PropositionPipelineaccepts an optional evaluator that scores propositions before persistence. Propositions below a threshold are dropped or routed to a review status.Trust as a Proposition field — add a
trustScore: Double?field alongsideconfidence. Confidence = "how certain is the extraction," trust = "how much should we believe it given external signals." Orthogonal dimensions.Where trust scoring would matter
proposition.confidencegrounding.size,sourceIds.sizemetadataor provenance (#17)The cheap signals (confidence, corroboration, source authority) could filter a meaningful percentage of low-quality extractions before any expensive checks run.