[arXiv]score: 0.08

Model Size vs. Topic Coherence: Seven Transformers Benchmarked in BERTopic

May 29, 2026

Systematic evaluation of seven transformer models from MiniLM to LLaMA-2 as embedding backends in BERTopic pipelines, measuring topic quality via coherence and divergence metrics across multiple corpora. Aims to quantify how parameter count affects topic modeling quality compared to LDA baselines.

cs.CLcs.AI

HOW THIS AFFECTS YOU

●

builderIf you use BERTopic in production, results may inform whether upgrading to a larger embedding model meaningfully improves topic quality for your corpus size.

●

researcherCoherence and divergence metrics across model sizes give you a reference point for embedding selection in topic modeling pipelines.

SOURCE

https://arxiv.org/abs/2605.28832

← back to feed