●builderDailyReport provides a more realistic evaluation surface for search agents targeting consumer use cases than specialized academic benchmarks.
●researcherThe cascade rubric design enables fine-grained performance attribution across subtask dimensions, improving interpretability over coarse task-level scoring.