[arXiv]score: 0.39

REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations

May 14, 2026

REALISTA frames hallucination elicitation as constrained optimization in latent space, generating semantically coherent adversarial prompts equivalent to benign inputs that reliably trigger LLM hallucinations. It overcomes discrete prompt search limitations by operating continuously while preserving semantic equivalence. Red-teamers and safety researchers gain a principled benchmark tool for stress-testing production LLMs.

cs.CLcs.AIcs.CRcs.LG

SOURCE

https://arxiv.org/abs/2605.12813

← back to feed