NarraBERT Maps Narrative Structure Across 3M Passages in Dolma
June 16, 2026
A RoBERTa-based model called NarraBERT, finetuned on 400 annotated passages, classifies 11 narrative dimensions (agency, setting, events) across 3M passages of the 3-trillion-token Dolma corpus, producing the NarraDolma dataset. The work establishes that narrative structure is measurable at scale across heterogeneous web data.
HOW THIS AFFECTS YOU
●
researcherNarraDolma gives you a large-scale narrative-annotated pretraining corpus for studying how narrative composition in training data affects model behavior.