●researcherSelf-report-based LLM evaluation is unreliable even with natively derived constructs, undermining a common shortcut for behavioral characterization.
●policyWorth watching because personality-based alignment or safety assessments using self-report inventories lack predictive validity and should not be used as behavioral proxies.