DueLLM evaluates models under competitive conditions rather than in isolation. By placing them on a continuous cellular automaton substrate, it exposes strategic failure modes that static benchmarks are poorly suited to measure.
DueLLM
Status early Domain games Themes autonomous-agents · human-machine-interface