[r/MachineLearning]score: 0.17

Production AI very different from the demos [D]

May 5, 2026

A Reddit practitioner report highlights a critical production AI cost trap: RAG pipeline integration doubled per-call input token length versus prototype baselines, while real user queries proved significantly longer and noisier than curated test sets. GPT-4o costs became unattributable at feature level since OpenAI's dashboard lacks granular cost allocation by feature or endpoint. Engineers shipping LLM features into production should implement token logging middleware and per-feature metadata tagging before launch, not after finance escalates. This mirrors known gaps in MLOps observability tooling where inference cost attribution remains unsolved compared to traditional cloud resource tagging.

discussion

SOURCE

https://www.reddit.com/r/MachineLearning/comments/1t4mzm3/production_ai_very_different_from_the_demos_d/

← back to feed