[arXiv]score: 0.12
LLMs Remain Robust to Near-Total Character Shuffling and Invisible Character Injection
May 29, 2026
Many LLMs maintain notable task performance even when nearly all words are character-shuffled into human-unreadable text or when invisible characters outnumber visible ones by several times. The robustness stems from resilience to chaotic tokenization and fragmented segmentation, with both implicit and explicit denoising mechanisms identified. This has direct implications for adversarial prompt design and input sanitization in production systems.
cs.CL
HOW THIS AFFECTS YOU
●
builderYou should account for LLM robustness to invisible character injection when designing input validation or adversarial abuse detection — these perturbations may not degrade model outputs as expected.
●
researcherThe identified tokenization-level denoising mechanisms provide a concrete target for studying LLM robustness and potential adversarial exploits.
●
policyInvisible character injection remaining ineffective as a jailbreak vector is useful context, but the same robustness could complicate content filtering and detection pipelines.