Word2Vec Tested on 130-Word Toki Pona Vocabulary at 1.4M Sentences
June 17, 2026
Word2Vec trained on 1.4 million Toki Pona sentences probes whether semantic embeddings hold under extreme vocabulary reduction, with separate models trained on clean versus noisy corpora containing loanwords and neologisms. Results examine whether linguistic noise in a minimal-vocabulary constructed language helps or hurts embedding quality.
HOW THIS AFFECTS YOU
●
researcherNiche but controlled experiment on embedding behavior at vocabulary extremes; relevant if you study low-resource or constructed language representations.