●researcherOffers a concrete two-step reasoning framework and simple test suite for benchmarking causal and hypothetic-deductive capabilities in LLMs.
●policyWorth watching because it proposes a specific, testable AGI criterion that could inform capability evaluation standards.