[r/artificial]score: 0.16

eTPS — Effective Tokens Per Second: A Better Way to Measure Local LLM Performance

May 6, 2026

eTPS (Effective Tokens Per Second) is a proposed community metric from Reddit's AI practitioners that reweights raw token throughput by answer quality, correction loops, and context retention across multi-turn sessions. Unlike raw TPS, eTPS penalizes hallucinations and retry cycles, dividing accepted output tokens by total wall-clock time including corrections. Local LLM deployers benchmarking quantized models on consumer hardware should care immediately, as a Q4 model outpacing Q8 on raw TPS may underperform on eTPS once correction overhead is factored in. No formal paper yet, but the framing directly challenges the dominant llama.cpp and ollama benchmarking conventions.

project

SOURCE

https://www.reddit.com/r/artificial/comments/1t5tije/etps_effective_tokens_per_second_a_better_way_to/

← back to feed