[arXiv]score: 0.17

VLM Detection Fails Above Specific Resolution Thresholds for ASCII Art Jailbreaks

June 30, 2026

Large Vision-Language Models exhibit a sharp decline in detecting harmful text encoded as ASCII art once image resolution exceeds specific thresholds. Testing across eight construction modes and English/Chinese corpora shows that word-embedded designs are the most resistant to detection regardless of scale.

HOW THIS AFFECTS YOU

●

builderYou need to implement multi-scale or text-specific preprocessing to mitigate ASCII-based jailbreak attacks.

●

policyYou should account for visual encoding vulnerabilities when setting safety standards for multimodal moderation.

read original ↗arxiv.org

← back to feed