← Projects

DueLLM

Status early Domain games Themes autonomous-agents · human-machine-interface

Static benchmarks measure what a model can do in isolation; DueLLM measures what happens when two models push back against each other on a Flow-Lenia substrate in real time. The continuous cellular automaton provides the interaction dynamics and competitive pressure necessary to expose failure modes in strategic brittleness no amount of single-model evaluation will find.