VLM Detection Fails Above Specific Resolution Thresholds for ASCII Art Jailbreaks
June 30, 2026
Large Vision-Language Models exhibit a sharp decline in detecting harmful text encoded as ASCII art once image resolution exceeds specific thresholds. Testing across eight construction modes and English/Chinese corpora shows that word-embedded designs are the most resistant to detection regardless of scale.
HOW THIS AFFECTS YOU
●
builderYou need to implement multi-scale or text-specific preprocessing to mitigate ASCII-based jailbreak attacks.
●
policyYou should account for visual encoding vulnerabilities when setting safety standards for multimodal moderation.