Failure modes & case studies
For each LLM agent, the 5 worst-scoring and 5 best-scoring bundles.
Use these for the paper's qualitative section — each card links to
/compare with the slider already positioned.
For each LLM agent, the 5 worst-scoring and 5 best-scoring bundles.
Use these for the paper's qualitative section — each card links to
/compare with the slider already positioned.