[r/MachineLearning]score: 0.04
OpenSimula — open implementation of Simula-style mechanism design for synthetic data (in AfterImage) [P]
April 23, 2026
**OpenSimula** is an open-source Python implementation of the Simula mechanism-design framework (Davidson et al., TMLR) added to the AfterImage dataset toolkit, targeting controlled synthetic data generation for SFT and evaluation pipelines. Rather than simple prompt-response pairs, it structures generation around LLM-built factor taxonomies with weighted sampling, meta-prompt diversification, and a requirement critic refinement loop with an optional double-critic gate for verifiable MCQ outputs. This matters for practitioners who need reproducible, axis-controlled diversity in synthetic datasets rather than ad-hoc prompt engineering — particularly relevant for eval set construction where coverage of a reasoning space matters more than raw volume.
project