[arXiv]score: 0.15

DiPS: Dialogue Policy Selection for High-Stakes Persuasion Agents

July 3, 2026

DiPS uses a Q-learning framework to dynamically select persuasion strategies in high-stakes environments like fire-rescue scenarios. By training a critic to maximize evacuation success based on real-time utterance context, the method outperforms zero-shot LLMs and RAG-augmented baselines in both simulated and human interaction evaluations.

read original ↗arxiv.org

DAILY DIGEST

catch up on AI in 2 minutes, every morning. free. unsubscribe anytime. privacy

← back to feed