Adversarial Bypass Taxonomy
corvid test adversarial turns the effect-system spec into attack prompts and
then runs every generated program through the compiler. The deterministic seed
path ships first so CI has a stable no-network safety gate; provider-backed LLM
sampling can feed more programs into the same classifier later.
Categories
| Category | Invariant | Bypass Angles |
|---|---|---|
approval | Dangerous tools require an in-scope approve with the right label and arity. | Direct dangerous calls, wrong approval scope, wrong approval label or arity. |
trust | @trust(autonomous) cannot call human_required or supervisor_required effects. | Hide high-trust effects behind helper agents or renamed effects. |
budget | @budget checks worst-case composed cost before runtime. | Split work across tools or helpers so total cost appears smaller. |
provenance | Grounded<T> returns require a retrieval provenance chain. | Fabricate Grounded<T> from non-retrieval tools or aliases. |
reversibility | @reversible excludes any call chain containing irreversible effects. | Hide reversible: false under neutral tool names or wrapper agents. |
confidence | @min_confidence composes by minimum, not mean. | Mix strong and weak sources and rely on averaging intuition. |
Generator Contract
The prompt asks a model to return JSONL objects with:
{"category":"approval","title":"direct dangerous call","source":"...complete .cor program..."}Each source must be a complete program and must attempt exactly one bypass.
The classifier treats every generated program as an expected rejection. If the
compiler accepts one, the row is marked ESCAPED; that is either a compiler
safety bug or an invalid generator prompt that must be reclassified before
release.
Issue Filing
Set both environment variables to file escaped bypasses directly:
CORVID_ADVERSARIAL_FILE_ISSUES=1GITHUB_TOKEN=...corvid test adversarial --count 100 --model opusWithout those variables the command remains offline and CI-safe. Escapes still exit non-zero; they are not silently ignored.