[arXiv]score: 0.15
DiPS: Dialogue Policy Selection for High-Stakes Persuasion Agents
July 3, 2026
DiPS uses a Q-learning framework to dynamically select persuasion strategies in high-stakes environments like fire-rescue scenarios. By training a critic to maximize evacuation success based on real-time utterance context, the method outperforms zero-shot LLMs and RAG-augmented baselines in both simulated and human interaction evaluations.
DAILY DIGEST
catch up on AI in 2 minutes, every morning. free. unsubscribe anytime. privacy