[arXiv]score: 0.39
REALISTA: Realistic Latent Adversarial Attacks that Elicit LLM Hallucinations
May 14, 2026
REALISTA frames hallucination elicitation as constrained optimization in latent space, generating semantically coherent adversarial prompts equivalent to benign inputs that reliably trigger LLM hallucinations. It overcomes discrete prompt search limitations by operating continuously while preserving semantic equivalence. Red-teamers and safety researchers gain a principled benchmark tool for stress-testing production LLMs.
cs.CLcs.AIcs.CRcs.LG