Corvid v1.0 launch demo script

A five-minute live demo that grounds every Corvid v1.0 claim in a command the audience can re-type. Every block below is copy-pasteable against a corvid binary built from main at the time of the v1.0 cut.

Target audience: a skeptical engineer who has already written production LLM code in Python or TypeScript and doesn’t believe a new language is worth learning. The demo’s goal is to make one claim impossible to dismiss: Corvid’s AI-safety guarantees are compile-time and survive into production operations. Every other feature is secondary to that.

The demo is five acts. Each act ends at a command whose output proves the previous claim.

Act I — the compile-time claim

Claim. A dangerous tool call with no approve block fails to compile. Python / TypeScript cannot make that guarantee.

cat > refund_bot.cor <<'EOF'
type Ticket:
    order_id: String
    user_id: String
    message: String

type Order:
    id: String
    amount: Float
    user_id: String

type Decision:
    should_refund: Bool
    reason: String

type Receipt:
    refund_id: String
    amount: Float

tool get_order(id: String) -> Order
tool issue_refund(id: String, amount: Float) -> Receipt dangerous

prompt decide_refund(ticket: Ticket, order: Order) -> Decision:
    """Should we refund? Consider amount + complaint."""

agent refund_bot(ticket: Ticket) -> Decision:
    order = get_order(ticket.order_id)
    decision = decide_refund(ticket, order)
    if decision.should_refund:
        issue_refund(order.id, order.amount)   # <-- no `approve` above it
    return decision
EOF

corvid check refund_bot.cor

Expected. refund_bot.cor fails with a typed error pointing at the bare issue_refund call, naming the @dangerous annotation, and suggesting the approve IssueRefund(...) block the compiler requires.

Fix it (add approve IssueRefund(order.id, order.amount) before the dangerous call). corvid check now passes.

Punchline. The proof isn’t “we ran a linter.” It’s “this code cannot be compiled until the approve block is there.” Linters rot; types don’t.

Act II — the effect algebra claim

Claim. Trust / cost / reversibility / grounding compose algebraically at compile time, and the compiler proves the composition.

corvid verify --corpus tests/corpus

Expected. A grid showing every program in the corpus, each checked against four independent tiers (type checker, interpreter, native codegen, replay). All four tiers agree on every effect dimension for every program — except the two deliberate should_fail fixtures, which are the adversarial programs the compiler must reject.

Punchline. When four independent execution paths agree on the algebra of a program and disagree only on the programs they’re meant to disagree on, the algebra is real. See docs/internals/effect-spec/12-verification.md for the full methodology.

Act III — the runtime-survives-deployment claim (Phase 21)

Claim. Every run writes a deterministic trace; production traffic becomes the regression suite; a bad PR can’t merge without the operator seeing what changed.

# Run the refund_bot demo a few times. Traces land under target/trace/.
cd examples/refund_bot_demo
corvid run src/refund_bot.cor --target=interp
corvid run src/refund_bot.cor --target=interp
corvid run src/refund_bot.cor --target=interp

# Now treat those traces as a regression suite against the current code.
corvid test --from-traces target/trace/ --from-traces-source src/refund_bot.cor

Expected. A per-trace verdict table. All three traces pass — the code hasn’t changed since they were recorded.

Now break the code. Edit src/refund_bot.cor: change the prompt text or tweak a decision-path branch. Re-run the same command.

Expected. The same command now reports DIVERG for the affected traces, exits 1, and names the specific divergence per trace.

Punchline. In Python, turning production traffic into a regression suite requires a bespoke harness someone has to build. In Corvid it’s a CLI flag. The language ships with the harness because the language ships with the trace format.

Act IV — the PR-review claim (`corvid trace-diff`)

Claim. Every PR produces a signed behaviour receipt describing what changed along Corvid’s effect algebra. The reviewer tool is itself a Corvid program.

# In any git repo with a .cor file:
corvid trace-diff HEAD~1 HEAD src/refund_bot.cor

Expected. A markdown receipt enumerating every algebraic change: trust-tier moves, @dangerous additions, @replayable gains / losses, agents added / removed. If nothing algebraic changed, the receipt says so explicitly.

# The reviewer source is in-repo and human-readable:
bat crates/corvid-cli/src/trace_diff/reviewer.cor

Expected. A short .cor file declaring review_pr(base, head, impact) -> String as a @deterministic Corvid agent. Three hundred lines or so. Not a Rust function — a Corvid program compiled at runtime.

Punchline. Corvid hosts its own governance. Other languages ship PR-review tools in some other language because they can’t host the algebra themselves. Corvid’s flagship PR-review tool is a .cor file the user can read, fork, and extend — and it is the only such tool whose output is provably deterministic because the type checker says so.

Act V — the replay claim

Claim. Every recorded trace replays to byte-identical state. Differential replay swaps one axis (e.g., LLM model); counterfactual replay overrides one step. Both surface divergences.

# Plain replay: should match exactly.
corvid replay target/trace/run-abc.jsonl

# Differential: swap the recorded model for a different one, same args.
corvid replay target/trace/run-abc.jsonl --model claude-opus-5-0

# Counterfactual: what if the decision-prompt had said `false` instead?
corvid replay target/trace/run-abc.jsonl --mutate 2 '{"should_refund": false}'

Expected.

Plain replay finishes silently; exit 0.
Differential replay prints a per-prompt divergence list (or “no divergences” if the live model agrees with the recording).
Counterfactual replay prints a step-by-step log of how the run diverges from the recording after the mutation point.

Punchline. These three commands, end-to-end, are the reason Corvid exists. Any LLM-orchestration language that cannot do this cannot be honestly said to support “testing” in production. Python and TypeScript cannot do this; they ship the problem as a library integration exercise you have to re-solve in every codebase.

Off-ramp one-liners

After the five acts, have one command per likely audience question ready:

They ask	You run
”What does the provenance DAG look like?”	`corvid trace dag target/trace/run-abc.jsonl \| dot -Tsvg > /tmp/p.svg`
”Can I embed this in Rust?”	`corvid build src/agent.cor --target=cdylib --all-artifacts`, then show the `.so` + `.h` + `.corvid-abi.json` drop
”Can I query the library at runtime?”	`corvid abi dump target/release/agent.so` (shows the embedded capability catalog from 22-C)
“Can approvers be a `.cor` file too?”	`corvid approver check examples/cdylib_catalog_demo/src/approver.cor` (22-E)
“Is this actually fast?”	Show `docs/phases/memory-foundation-results.md`; be honest about the 25-36× Python / 1.7-2.6× TypeScript orchestration numbers from Phase 17 and explain that v1.0 is correctness-first, performance parity post-v1.0
”Show me the counterexamples corpus”	`ls docs/internals/effect-spec/counterexamples/` + `cat docs/internals/effect-spec/counterexamples/composition/and_with_or.cor`

Do-not-demo

Tempting but skip:

The REPL. It’s for interactive exploration, not a launch talk.
The full spec walkthrough. Point at the table of contents and move on.
Benchmark numbers unless the audience insists. The Phase 17 numbers are honest-slower; they require context that doesn’t fit in a demo.
The Phase 22 cdylib + 22-C catalog unless an audience question naturally lands you there. They are v0.7 material, not v1.0, and the demo should land on the v1.0 story cleanly first.
Anything about future phases. The audience cares about what works today, not what might.

Setup + rehearsal notes

Pre-compile corvid before the demo. No “hold on, it’s building” moments.
Pre-record a handful of traces so Act III has something to test against without waiting for live LLM calls.
Configure CORVID_MODEL + the relevant API key (ANTHROPIC_API_KEY or OPENAI_API_KEY) in .env beside the demo source. The CLI walks up from the source’s parent looking for it.
Have git and dot on PATH. Act IV and the first off-ramp both need them.
Never run Act III and then edit the file live while the audience watches. Precompute the broken variant in a branch and git checkout into it after Act III’s first run. Live editing eats minutes you don’t have.
Rehearse the five acts end-to-end four times. Each rehearsal should take under seven minutes without audience questions. If you’re over seven, cut Act V’s counterfactual sub-bullet — plain + differential is enough to land the claim.

Post-demo next steps for interested engineers

Hand them this list on a slide:

git clone https://github.com/corvid-lang/corvid && cd corvid && cargo build --release
Work through docs/internals/effect-spec/00-overview.md
Try to write a program that bypasses the effect checker. Open an issue if you succeed; your counter-example becomes a permanent regression test.
Read docs/internals/effect-spec/14-replay.md if the Phase 21 claims caught their attention.
Read docs/internals/effect-spec/11-related-work.md if they want to compare against Koka, Eff, Frank, or effect handlers in Haskell.