[HUGGINGFACE]score: 0.42

Risk Under Pressure: Compute-Aware Evaluation of Adversarial Robustness in Language Models

June 8, 2026

Fixed-budget attack success rate (ASR) comparisons ignore that different jailbreak strategies can differ by orders of magnitude in compute cost. This framework replaces query-count budgets with cumulative FLOPs as the adversarial effort metric, producing risk-compute curves that show how attack success probability scales with actual computational expenditure. The two derived summary metrics let practitioners assess whether a defense raises the cost-to-success ratio meaningfully rather than just deflecting low-effort attacks.

read original ↗huggingface.co

← back to feed