A non-saturating, open-ended environment for evaluating LLMs in Factorio - JackHopkins/factorio-learning-environment| GitHub
Large Language Models (LLMs) are rapidly saturating existing benchmarks, necessitating new open-ended evaluations. We introduce the Factorio Learning Environment (FLE), based on the game of Factorio, that tests agents in long-term planning, program synthesis, and resource optimization. FLE provides exponentially scaling challenges -- from basic automation to complex factories processing millions of resource units per second. We provide two settings: (1) lab-play consisting of eight structured...| arXiv.org
3 years of war in Ukraine has left children suffering and in need of support. Please donate to our Ukraine appeal.| UNICEF UK