[HN]score: 0.41

SubQ 1.1 Small Uses Sparse Attention for Near-Perfect 12M Token Retrieval

June 16, 2026

SubQ 1.1 Small uses Subquadratic Sparse Attention (SSA) to achieve near-perfect needle-in-a-haystack retrieval up to 12M tokens with up to 1000x attention compute reduction versus standard quadratic attention. A broader model lineup from 2M to 12M token context is planned; current access is limited to design partners.

HOW THIS AFFECTS YOU

●

builderIf SSA claims hold at production scale, this could replace chunking and RAG pipelines for long-document tasks — watch for the broader API rollout.

●

researcherThe 1000x attention compute reduction claim on 12M-token contexts warrants scrutiny of the technical report, particularly how SSA trades off recall versus compute.

●

founderEliminates the core architectural constraint behind RAG-as-a-workaround; if the model generalizes, it threatens the retrieval pipeline tooling category.

read original ↗subq.ai

← back to feed