HACKOBAR_item
[arXiv]score: 0.22

BioGraphletQA: Knowledge-Anchored Generation of Complex QA Datasets

April 30, 2026
BioGraphletQA drops a 119,856-pair biomedical QA dataset built on a graphlet-anchored generation pipeline, where subgraphs of up to 5 nodes from the OREGANO knowledge graph drive structured LLM prompts to enforce factual grounding and compositional complexity. PubMed snippets enrich most pairs, and domain-expert evaluation on 106 samples confirms scientific validity. Biomedical NLP researchers and KGQA practitioners building RAG or fine-tuning pipelines on clinical or life-sciences data should prioritize this over synthetic QA sets lacking explicit KG grounding, as the graphlet constraint directly controls reasoning depth in ways flat-text generation cannot.
cs.CL