Skip to content

05 — Grounding and provenance

The claim: any value typed Grounded<T> traces back to at least one retrieval source, and the chain of transformations is inspectable at runtime.

Ground­ing is the property that makes Corvid’s LLM outputs auditable. Other systems let retrieval-augmented pipelines produce strings that may or may not reference the documents they cite; Corvid’s compiler proves at build time that a Grounded<T> return was fed by retrieval, and the runtime carries the provenance chain alongside the value.

1. The Grounded<T> type

Grounded<T> is a wrapper type in the type system. At compile time the checker enforces the provenance obligation; at runtime, the value carries a ProvenanceChain exposed via .sources().

# expect: compile
effect retrieval:
data: grounded
tool fetch_doc(id: String) -> Grounded<String> uses retrieval
prompt summarize(doc: String) -> Grounded<String>:
"Summarize {doc}"
agent research(id: String) -> Grounded<String>:
doc = fetch_doc(id)
return summarize(doc)
  • fetch_doc declares uses retrieval where retrieval has data: grounded. This is the source declaration.
  • summarize consumes a Grounded<String> and returns a Grounded<String>. The wrapper propagates.
  • research’s body ends in a value whose provenance chain includes fetch_doc. The check passes.

2. The provenance obligation

Formal rule (restated from 03 § 5):

Γ ⊢ e : Grounded<τ>
chain(e) = [s₀, s₁, …, sₙ]
∃ i. sᵢ has effect with data = grounded
─────────────────────────────────────────
e's provenance is valid

The checker builds chain(e) during effect analysis. A chain is a sequence of the operations that contributed to e’s value — the retrieval tool, the prompt transformations, the agent handoffs.

Sources of a chain:

ConstructContributes
Tool call with data: groundedIntroduces a source. Chain extends by this retrieval.
Prompt callInherits sources from its arguments. The LLM call is a transform, not a source.
Agent callPropagates sources from any Grounded<T> argument.
Literal / computationNo sources. Chain is empty.

If an agent declares -> Grounded<T> but the returned value’s chain is empty, the checker emits UngroundedReturn with a hint pointing at the missing retrieval.

3. Runtime provenance chain

At runtime, every Grounded<T> value carries a ProvenanceChain that records every operation applied to it. The chain is typed:

ProvenanceChain = [Step]
Step = Retrieval { source: String, timestamp: Instant, metadata: JSON }
| PromptTransform { prompt_name: String, model: String, tokens: usize }
| AgentHandoff { agent_name: String }
| Severed { reason: String }

Grounded<T>::sources() returns the list. Consumers can inspect the chain at any point to answer questions like:

  • “Which document was this summary derived from?”
  • “Which LLM generated this classification?”
  • “Did any transformation sever the chain?”

The Severed step documents where a chain was deliberately cut — e.g., when a synthesis step mixed grounded and ungrounded inputs, the result cannot claim grounding. That step is the runtime’s equivalent of the compile-time UngroundedReturn error — if you sever a chain and then claim Grounded<T>, the runtime raises an error.

4. cites ctx strictly — runtime citation verification

Some prompts need to do more than be grounded; they need to cite the specific grounded context in the output. The cites ctx strictly clause on a prompt opts in to runtime citation verification:

# expect: skip
prompt answer(doc: String, question: String) -> Grounded<String>:
cites doc strictly
"Given this document: {doc}\n\nAnswer: {question}"

When the LLM responds, the runtime verifier walks the response against doc and confirms every quoted span appears verbatim in the source. If the response cites text that doesn’t appear in doc, the verifier raises CitationViolation with the unverifiable span.

This is stricter than most retrieval-augmented frameworks. Others check “did the model reference the document by ID?” — Corvid checks “does every verbatim quoted span exist in the document?”

See crates/corvid-vm/src/interp.rs and the cites_strictly_param field on IrPrompt for the runtime plumbing.

5. When grounding matters

Use cases that require Grounded<T>:

  • RAG pipelines. The return value must be derivable from retrieved context, not fabricated.
  • Legal/compliance answers. Claims must cite source material.
  • Medical/financial advice. Outputs must reference underlying records.
  • Factual summarization. No hallucinated quotes.

Programs that don’t need grounding use plain types — String, Result<T, E>, Option<T>. The type wrapper is opt-in; the constraint is mandatory when the wrapper is present.

6. Composition with other dimensions

Grounding interacts with other dimensions through the data-union composition rule:

  • data: grounded composes with other data categories (financial, medical, pii) via Union — a chain that retrieves financial records then runs a grounded transformation has data: grounded, financial.
  • The Grounded obligation is independent: @data(grounded) on an agent requires grounded is in the composed set; @data(financial) requires financial. Both can be declared.

7. Interaction with @min_confidence

A grounded chain’s confidence is still bounded by its weakest step (04 § 4.6). A retrieval at 0.99 feeding a summarization at 0.70 yields a Grounded<String> with composed confidence 0.70. If the agent declares @min_confidence(0.80), compilation fails even though the chain is grounded.

Grounding and confidence answer different questions:

  • Grounding: does the output trace back to real context?
  • Confidence: how certain are the statistical judgments along the way?

A grounded, low-confidence answer is still grounded; it’s just not confident. Consumers can check both properties independently.

8. Implementation references

9. Provenance propagation and @grounded_pure

The grounding from sections 1-8 is constructive — a value either has a Grounded<T> type or it does not. The Provenance Propagation phase adds two things on top of that base: the contagion law (grounded-ness flows through ordinary code so users do not have to thread Grounded<T> annotations everywhere) and the @grounded_pure moat (a compile-time guarantee that an agent body launders no grounded value into a non-grounded slot).

9.1. Contagion law

A binary or unary operator over a Grounded<T> operand returns Grounded<T>:

  • Grounded<String> + String returns Grounded<String>. The wrapper is propagated; the inner type is the operator’s normal result type. The runtime lifts both operands, computes on the inner values, and re-wraps with the minimum of the two confidences and a Derived provenance node naming the operator and pointing at both source chains.
  • The rule covers arithmetic, comparison, string concat, and unary operators. Short-circuit && / || are out of scope because they would change evaluation order; control-flow conditions instead get the dedicated D2 rule below.
  • A call to a prompt / tool / agent whose declared effect_row carries data: grounded returns Grounded<T> at the call site even when the declared return is plain T. The typechecker, the IR, and the runtime all agree the result is grounded.

9.2. Control-flow conditions (D2)

if <expr>: accepts Grounded<Bool> directly — branching consumes the bool to pick a path, it does not emit a laundered value. The implicit unwrap is recorded as an IR-visible discard so @grounded_pure can still see it.

9.3. Legacy Grounded<T> -> T coercion + IR-visible discard

The historical is_assignable_to rule that lets a Grounded<T> value flow into a non-grounded slot remains in place for backwards compatibility. Every such site is now recorded in Checked.grounded_coercion_sites and IR lowering wraps the produced expression in IrExprKind::UnwrapGrounded. The recorded sites are:

  • return value, yield value, let binding with type annotation;
  • call argument, struct field initialiser, Some(...) against an Option<T> slot;
  • list element flowing into an inferred bare-T element type;
  • replay arm body, else arm body;
  • if condition (D2).

A laundering site that the slot-check enumeration misses is a silent moat hole — the proof obligation rests on completeness here, not on the rule’s wording.

9.4. @grounded_pure proof obligation (D6)

An agent annotated @grounded_pure fails compile if its body contains any of:

  1. A recorded coercion site (case 1, implicit coercion).
  2. A .unwrap_discarding_sources() method-style call (case 2, explicit unwrap).
  3. A call into another agent that is not itself @grounded_pure (case 3, transitive composition).

Tools and prompts are external boundaries that cannot launder by construction — their declared return type either is Grounded<T> (preserved) or non-grounded (which the slot-check at the call site catches as case 1). The composition through the call graph mirrors @deterministic: a @grounded_pure agent can only call other @grounded_pure agents.

Guarantee row: grounded.no_laundering in ../../../crates/corvid-guarantees/src/registry.rs. Proof tests: grounded_pure_* in ../../../crates/corvid-types/src/tests.rs. Tour: corvid tour --topic provenance-propagation.

9.5. Design doc

The full design plus slice-by-slice rationale (Decisions D1-D8, risks R1-R6, shipped sub-splits per slice) is in ../../meta/grounded-propagation-design.md.

Next

06 — Confidence-gated trust — how autonomous_if_confident(T) threads a runtime trust boundary through the statically-composed trust dimension.