[HUGGINGFACE]score: 0.42
DAR Agentic Harness Improves Deontic Reasoning but Degrades Weaker Models
June 2, 2026
DAR wraps LLMs in an agentic harness that retrieves statute sections on demand for deontic reasoning tasks like tax computation and immigration appeals, evaluated on hard subsets of DeonticBench. Stronger models benefit from the harness, but weaker models often degrade, suggesting capability thresholds matter for agentic retrieval gains.
paper
HOW THIS AFFECTS YOU
●
builderThe finding that agentic retrieval harnesses hurt weaker models is a practical warning for legal or compliance agent pipelines: model capability must exceed a threshold before retrieval augmentation helps.
●
researcherDAR provides a structured evaluation setup for deontic reasoning that separates retrieval quality from reasoning quality across model capability tiers.