Pilot bundles · pipeline scoreboard

Each card runs the pilot's reference implementation through the structural-invariant pipeline and lights up its score breakdown. Click a card for the per-layer invariants, NL assertions, and reference screenshots.

Scoring weights are pre-calibration placeholders; β can be 0 when reference screenshots are missing, and the natural-language verifier (γ) runs in stub-PASS mode here while the OpenRouter free tier is rate-limited. Cards marked DRAFT are pending team review of pilot annotator assignment / open design questions; see REPORT-12H §3.

Loading task scores…

Pilot bundles · pipeline scoreboard

Coverage matrix

Recent CI runs