[arXiv]score: 0.14

Deep Research Pipeline Raises Literature Search Recall from 20% to 80%

May 29, 2026

A bibliography-expanding Deep Research pipeline on the 250-paper RollingEval-Jun25 benchmark raises recall from under 20% to over 80% versus vanilla API search. An LLM-as-judge evaluation finds only 51% of human citations are moderately relevant or higher, versus 86–88% for top AI re-rankers, with humans 2.5x more likely to cite direct collaborators.

cs.AIcs.IR

HOW THIS AFFECTS YOU

●

builderThe breadth-first bibliography expansion technique is directly implementable to improve recall in any RAG pipeline over academic or document corpora.

●

researcherHuman citation lists are a biased evaluation target due to collaboration network effects — RollingEval-Jun25 and LLM-as-judge offer a more neutral benchmark for literature retrieval.

SOURCE

https://arxiv.org/abs/2605.29234

← back to feed