[arXiv]score: 0.15

Eight LLMs Tested as Social Engineering Attackers Across English and Chinese

May 29, 2026

An LLM-to-LLM red-teaming framework evaluates eight models on multi-turn social engineering scenarios in English and Chinese, finding that adversarial dialogues follow recurrent escalation patterns that single-turn safety evals miss entirely. Statistically significant cross-model and cross-lingual differences in attack success rates suggest current safety benchmarks underestimate real-world conversational manipulation risk.

cs.CL

HOW THIS AFFECTS YOU

●

researcherThe annotated attacker/defender strategy taxonomy and transition analysis provide a structured methodology for evaluating multi-turn adversarial robustness beyond standard benchmarks.

●

policyCross-lingual gaps in defensive performance across eight production-class models indicate that safety evaluations limited to English systematically underestimate manipulation vulnerability in other languages.

SOURCE

https://arxiv.org/abs/2601.03134

← back to feed