WcodeW failure modes

Failure modes & case studies

For each LLM agent, the 5 worst-scoring and 5 best-scoring bundles. Use these for the paper's qualitative section — each card links to /compare with the slider already positioned.

loading…