05 — Grounding and provenance
The claim: any value typed Grounded<T> traces back to at least one retrieval source, and the chain of transformations is inspectable at runtime.
Grounding is the property that makes Corvid’s LLM outputs auditable. Other systems let retrieval-augmented pipelines produce strings that may or may not reference the documents they cite; Corvid’s compiler proves at build time that a Grounded<T> return was fed by retrieval, and the runtime carries the provenance chain alongside the value.
1. The Grounded<T> type
Grounded<T> is a wrapper type in the type system. At compile time the checker enforces the provenance obligation; at runtime, the value carries a ProvenanceChain exposed via .sources().
# expect: compileeffect retrieval: data: grounded
tool fetch_doc(id: String) -> Grounded<String> uses retrieval
prompt summarize(doc: String) -> Grounded<String>: "Summarize {doc}"
agent research(id: String) -> Grounded<String>: doc = fetch_doc(id) return summarize(doc)fetch_docdeclaresuses retrievalwhereretrievalhasdata: grounded. This is the source declaration.summarizeconsumes aGrounded<String>and returns aGrounded<String>. The wrapper propagates.research’s body ends in a value whose provenance chain includesfetch_doc. The check passes.
2. The provenance obligation
Formal rule (restated from 03 § 5):
Γ ⊢ e : Grounded<τ> chain(e) = [s₀, s₁, …, sₙ] ∃ i. sᵢ has effect with data = grounded ───────────────────────────────────────── e's provenance is validThe checker builds chain(e) during effect analysis. A chain is a sequence of the operations that contributed to e’s value — the retrieval tool, the prompt transformations, the agent handoffs.
Sources of a chain:
| Construct | Contributes |
|---|---|
Tool call with data: grounded | Introduces a source. Chain extends by this retrieval. |
| Prompt call | Inherits sources from its arguments. The LLM call is a transform, not a source. |
| Agent call | Propagates sources from any Grounded<T> argument. |
| Literal / computation | No sources. Chain is empty. |
If an agent declares -> Grounded<T> but the returned value’s chain is empty, the checker emits UngroundedReturn with a hint pointing at the missing retrieval.
3. Runtime provenance chain
At runtime, every Grounded<T> value carries a ProvenanceChain that records every operation applied to it. The chain is typed:
ProvenanceChain = [Step]Step = Retrieval { source: String, timestamp: Instant, metadata: JSON } | PromptTransform { prompt_name: String, model: String, tokens: usize } | AgentHandoff { agent_name: String } | Severed { reason: String }Grounded<T>::sources() returns the list. Consumers can inspect the chain at any point to answer questions like:
- “Which document was this summary derived from?”
- “Which LLM generated this classification?”
- “Did any transformation sever the chain?”
The Severed step documents where a chain was deliberately cut — e.g., when a synthesis step mixed grounded and ungrounded inputs, the result cannot claim grounding. That step is the runtime’s equivalent of the compile-time UngroundedReturn error — if you sever a chain and then claim Grounded<T>, the runtime raises an error.
4. cites ctx strictly — runtime citation verification
Some prompts need to do more than be grounded; they need to cite the specific grounded context in the output. The cites ctx strictly clause on a prompt opts in to runtime citation verification:
# expect: skipprompt answer(doc: String, question: String) -> Grounded<String>: cites doc strictly "Given this document: {doc}\n\nAnswer: {question}"When the LLM responds, the runtime verifier walks the response against doc and confirms every quoted span appears verbatim in the source. If the response cites text that doesn’t appear in doc, the verifier raises CitationViolation with the unverifiable span.
This is stricter than most retrieval-augmented frameworks. Others check “did the model reference the document by ID?” — Corvid checks “does every verbatim quoted span exist in the document?”
See crates/corvid-vm/src/interp.rs and the cites_strictly_param field on IrPrompt for the runtime plumbing.
5. When grounding matters
Use cases that require Grounded<T>:
- RAG pipelines. The return value must be derivable from retrieved context, not fabricated.
- Legal/compliance answers. Claims must cite source material.
- Medical/financial advice. Outputs must reference underlying records.
- Factual summarization. No hallucinated quotes.
Programs that don’t need grounding use plain types — String, Result<T, E>, Option<T>. The type wrapper is opt-in; the constraint is mandatory when the wrapper is present.
6. Composition with other dimensions
Grounding interacts with other dimensions through the data-union composition rule:
data: groundedcomposes with other data categories (financial,medical,pii) via Union — a chain that retrieves financial records then runs a grounded transformation hasdata: grounded, financial.- The Grounded obligation is independent:
@data(grounded)on an agent requiresgroundedis in the composed set;@data(financial)requiresfinancial. Both can be declared.
7. Interaction with @min_confidence
A grounded chain’s confidence is still bounded by its weakest step (04 § 4.6). A retrieval at 0.99 feeding a summarization at 0.70 yields a Grounded<String> with composed confidence 0.70. If the agent declares @min_confidence(0.80), compilation fails even though the chain is grounded.
Grounding and confidence answer different questions:
- Grounding: does the output trace back to real context?
- Confidence: how certain are the statistical judgments along the way?
A grounded, low-confidence answer is still grounded; it’s just not confident. Consumers can check both properties independently.
8. Implementation references
- Compile-time check:
check_grounded_returnsin ../../crates/corvid-types/src/effects.rs. - Runtime chain:
GroundedValuein ../../crates/corvid-vm/src/value.rs. - Citation verifier: search for
cites_strictly_paramin ../../crates/corvid-vm/src/interp.rs.
9. Provenance propagation and @grounded_pure
The grounding from sections 1-8 is constructive — a value either has a Grounded<T> type or it does not. The Provenance Propagation phase adds two things on top of that base: the contagion law (grounded-ness flows through ordinary code so users do not have to thread Grounded<T> annotations everywhere) and the @grounded_pure moat (a compile-time guarantee that an agent body launders no grounded value into a non-grounded slot).
9.1. Contagion law
A binary or unary operator over a Grounded<T> operand returns Grounded<T>:
Grounded<String> + StringreturnsGrounded<String>. The wrapper is propagated; the inner type is the operator’s normal result type. The runtime lifts both operands, computes on the inner values, and re-wraps with the minimum of the two confidences and aDerivedprovenance node naming the operator and pointing at both source chains.- The rule covers arithmetic, comparison, string concat, and unary operators. Short-circuit
&&/||are out of scope because they would change evaluation order; control-flow conditions instead get the dedicated D2 rule below. - A call to a prompt / tool / agent whose declared
effect_rowcarriesdata: groundedreturnsGrounded<T>at the call site even when the declared return is plainT. The typechecker, the IR, and the runtime all agree the result is grounded.
9.2. Control-flow conditions (D2)
if <expr>: accepts Grounded<Bool> directly — branching consumes the bool to pick a path, it does not emit a laundered value. The implicit unwrap is recorded as an IR-visible discard so @grounded_pure can still see it.
9.3. Legacy Grounded<T> -> T coercion + IR-visible discard
The historical is_assignable_to rule that lets a Grounded<T> value flow into a non-grounded slot remains in place for backwards compatibility. Every such site is now recorded in Checked.grounded_coercion_sites and IR lowering wraps the produced expression in IrExprKind::UnwrapGrounded. The recorded sites are:
- return value, yield value, let binding with type annotation;
- call argument, struct field initialiser,
Some(...)against anOption<T>slot; - list element flowing into an inferred bare-
Telement type; - replay arm body,
elsearm body; ifcondition (D2).
A laundering site that the slot-check enumeration misses is a silent moat hole — the proof obligation rests on completeness here, not on the rule’s wording.
9.4. @grounded_pure proof obligation (D6)
An agent annotated @grounded_pure fails compile if its body contains any of:
- A recorded coercion site (case 1, implicit coercion).
- A
.unwrap_discarding_sources()method-style call (case 2, explicit unwrap). - A call into another agent that is not itself
@grounded_pure(case 3, transitive composition).
Tools and prompts are external boundaries that cannot launder by construction — their declared return type either is Grounded<T> (preserved) or non-grounded (which the slot-check at the call site catches as case 1). The composition through the call graph mirrors @deterministic: a @grounded_pure agent can only call other @grounded_pure agents.
Guarantee row: grounded.no_laundering in ../../../crates/corvid-guarantees/src/registry.rs. Proof tests: grounded_pure_* in ../../../crates/corvid-types/src/tests.rs. Tour: corvid tour --topic provenance-propagation.
9.5. Design doc
The full design plus slice-by-slice rationale (Decisions D1-D8, risks R1-R6, shipped sub-splits per slice) is in ../../meta/grounded-propagation-design.md.
Next
06 — Confidence-gated trust — how autonomous_if_confident(T) threads a runtime trust boundary through the statically-composed trust dimension.