Corvid Effect-System Bounty
Corvid’s core safety promise is compile-time: programs that violate approval, budget, trust, reversibility, confidence, or groundedness constraints should not compile. This page defines the public process for reporting bypasses and turning accepted reports into permanent regression fixtures.
What Counts
Submit a minimal .cor program if it does one of these:
- Performs or permits a
dangerousoperation without the compiler requiring the correctapprove. - Lets an agent violate
@budget,@trust,@reversible, or@min_confidence. - Returns
Grounded<T>without a realdata: groundedprovenance path. - Changes effect behavior across equivalent rewrites or execution tiers.
- Makes
corvid test spec --meta,corvid test adversarial, orcorvid verify --corpusmiss a violation they should catch.
False positives are also useful: if a safe program is rejected, file it, but label it as a false positive rather than a bypass.
Submission Format
Use the GitHub issue template Effect-system bypass report. Include:
- The complete
.corprogram. - The command you ran, usually
cargo run -q -p corvid-cli -- check file.cor. - The actual result and the expected result.
- Why the program should be legal or illegal under the spec.
- Environment details if the behavior depends on OS, target tier, or config.
Do not paste secrets, provider keys, private prompts, customer data, or real production traces. Reduce the report to a minimal synthetic reproduction first.
Triage
Reports are triaged into one of four outcomes:
| Outcome | Meaning | Action |
|---|---|---|
| Accepted bypass | The compiler accepted a program the spec says must reject. | Fix the checker, add the program to docs/effects-spec/counterexamples/, and credit the reporter. |
| Accepted false positive | The compiler rejected a valid program. | Fix the checker or clarify the spec, then add a positive regression test. |
| Spec ambiguity | The program exposes an unclear rule. | Clarify the spec before changing the compiler. |
| Not a bug | The program behaves as specified. | Close with the relevant spec link. |
Disclosure
Corvid is pre-v1.0, but safety bypasses still deserve careful handling. If a report includes a production exploit path or sensitive deployment details, open a private security advisory instead of a public issue. Otherwise, public issues are preferred because the counterexample corpus is meant to be auditable.
Credit
Accepted bypasses keep reporter credit in the fixture header. The seed corpus uses Corvid core-team credit only because those examples predate the public process. Future accepted community reports should use this shape:
# bypass: short_name# reporter: @github-handle# fixed_by: commit <sha># invariant: cost must compose by SumPermanent Regression
Every accepted bypass must land with:
- A
.corcounterexample fixture underdocs/effects-spec/counterexamples/. - A spec or dev-log note explaining the invariant.
- A test path that fails before the fix and passes after it.
- If relevant, an entry in
corvid test adversarial’s deterministic taxonomy or seed corpus.
The bounty process is not marketing. It is how the safety claim gets stronger over time.
Open bounty window — moat benchmark corpora (opened 2026-04-29)
Two published moat-benchmark corpora are now under explicit bounty review. Submissions that reduce Corvid’s published advantage on either corpus will be merged and credited.
benches/moat/compile_time_rejection/ — 50 named bug classes
Current published numbers:
- Corvid (
cargo run -q -p corvid-cli -- check): 50/50 rejected. - Python (
mypy --strict + pydantic): 0/50 rejected. - TypeScript (
tsc --strict + zod): 0/50 rejected.
A submission counts if it does any of:
- Provides an idiomatic Python rewrite that makes
mypy --strict + pydanticreject any of the 50 cases (including with additional pydantic validators a senior dev would actually use). The runner re-classifies that case fromacceptedtorejectedfor the Python column. - Provides an idiomatic TypeScript rewrite that makes
tsc --strict + zodreject any case. Same effect on the TypeScript column. - Provides a Corvid program of the same shape that the typechecker incorrectly accepts. That promotes the case to a real Corvid bypass under the existing rules above and triggers a fix.
Honesty requirements for accepted submissions:
- Implementations must use libraries / patterns a senior dev would actually reach for. Custom AST-walking semgrep rules, hand-rolled clang-tidy-style linters, or one-off mypy plugins are accepted only if they ship as part of a documented project template — not as one-shot reviewer-only tools.
- The submission lands in
cases/<NN>-<slug>/with the rewrite plus a note incase.tomlexplaining what the alternative baseline is.
benches/moat/provenance_preservation/ — multi-hop AI workflow chains
Current published numbers (10 chains — RAG aggregate, extract+translate, RAG-through-tool, classify+route, multi-source dedupe, RAG-to-JSON, conversational RAG, safety filter, numeric aggregation, rerank top-k):
- Corvid: 10/10 preserved.
- Python (LangChain + pydantic): 0/10 preserved.
- TypeScript (Vercel AI SDK + zod): 0/10 preserved.
A submission counts if it provides an idiomatic Python or TypeScript
implementation whose final return type at the language-type level exposes
a typed sources / source_documents / provenance field that traces
back to the original retrieval. The runner re-classifies the chain
accordingly and the headline number drops.
The static-only classification means: dict-shaped return values, manually
threaded metadata, or convention-only # sources kept here markers do not
count. The point is whether a downstream consumer can extract provenance
without reading the implementation.
benches/moat/governance_lines/ — governance LOC across reference apps
Current published numbers (3 reference apps — refund_bot, rag_qa_bot, support_escalation_bot):
| App | Corvid | Python | TypeScript |
|---|---|---|---|
refund_bot | 9 | 31 | 43 |
rag_qa_bot | 15 | 35 | 52 |
support_escalation_bot | 16 | 39 | 55 |
| total | 40 | 105 | 150 |
A submission counts if it provides a more idiomatic Python or TypeScript implementation of any of the apps that honestly reduces the marked-governance line count — by using a library that genuinely shortens the surface, not by stripping governance checks. Naive implementations that drop the audit log, skip the budget cap, or remove the approval token do not count (they ship the bug class the line was preventing).
A submission can also add a 4th, 5th, … reference app under
apps/<name>/ with all three stacks implementing the same
intended behaviour. The runner picks it up automatically.
benches/moat/replay_determinism/ — deterministic re-execution of refund_bot_demo
Current published numbers:
- Corvid (
cargo run -p refund_bot_demo, N=20, normalized): all pairs byte-identical. - Python (LangChain + LangSmith): bounty-open — no
_summary.jsonyet. - TypeScript (Vercel AI SDK + OTEL): bounty-open — no
_summary.jsonyet.
The harness invokes the same agent N times against mocked tools and a mocked LLM, normalizes wall-clock and run-id fields per the documented rules, and asks whether all N traces are byte-identical pairwise.
A submission counts if it provides an idiomatic Python or TypeScript
implementation of the same agent (contract in
benches/moat/replay_determinism/runs/corvid/agent_spec.md) plus a
runner that loops N times, captures the stack-native trace artifact,
applies its documented normalization, and emits _summary.json in
the standard shape. The orchestrator re-renders RESULTS.md and
the headline updates accordingly.
Submissions are rejected if the normalization rules paper over real divergence (e.g. stripping retry counters or attempt IDs that genuinely vary across runs). The point is what the stack ships out of the box — not how aggressively a reviewer can normalize.
benches/moat/time_to_audit/ — audit-logic LOC across 5 representative questions
Current published numbers (5 audit questions: list refunds, refunds over $50, refunds per user, denied refunds with LLM rationale, approval tokens issued):
- Corvid (JSONL trace under
target/trace/): 65 LOC to answer all five correctly. - Python (LangChain + LangSmith): bounty-open (5/5 queries unimplemented).
- TypeScript (Vercel AI SDK + OTEL): bounty-open (5/5 queries unimplemented).
A submission counts if it lands a stack sub-tree
(runs/<stack>/queries/q01.<ext> … q03.<ext>) plus an equivalent
corpus produced by running the actual agent in that stack — not
hand-rolled JSON. Each query reads --corpus <dir> and writes its
answer to --out <path> as JSON; the runner validates each answer
byte-for-byte against the question’s expected_answer.json and
counts audit-logic LOC (excluding blank lines, pure-comment lines,
and a bounded # region: cli-plumbing block).
Submissions are rejected if the corpus is hand-rolled, if the
answer differs from expected_answer.json, or if the LOC count
hides logic in unaccounted helper modules.
Window
The bounty window is rolling. The published numbers are “as of the most recent CI run that ran the runner against the committed corpus + bounty submissions.” There is no fixed close date — the corpora stay open and the published numbers stay current.
A submission that lands and reduces Corvid’s advantage gets a credit row in
the relevant RESULTS.md and the launch wording adapts.