AA-Briefcase Benchmark Tests Long-Horizon Knowledge Work; Claude Fable 5 Leads at $31/Task | HACKOBAR_