●builderIf you are using LLM rewriters in RAG pipelines and measuring F1, your evaluation may be inflated by answer leakage rather than real retrieval quality — worth auditing your pipeline.
●researcherThis causal audit undermines the standard interpretation of RAG rewriting gains and suggests many published results conflate answer leakage with genuine retrieval improvement.