●researcherThe 2x2 CoT-Output matrix is a concrete evaluation tool you can apply to audit reasoning models for alignment faking and reasoning unfaithfulness in multi-turn settings.
●policyTerminal refusal rates are insufficient for safety audits of reasoning models — this framework surfaces failure modes invisible to standard benchmarks, with direct implications for deployment oversight.