[HUGGINGFACE]score: 0.55

RayDer Unifies Camera Estimation and NVS in One Transformer for Video Scaling

May 28, 2026

RayDer consolidates camera estimation, scene reconstruction, and rendering into a single feed-forward transformer, treating dynamic content as a nuisance factor absorbed by a minimal dynamic state rather than reconstructing it. This makes self-supervised novel view synthesis a single-model scaling problem trainable on unconstrained real-world video.

paper

HOW THIS AFFECTS YOU

●

builderWorth tracking if you're building 3D reconstruction or NVS pipelines that need to train on in-the-wild video without camera metadata.

●

researcherThe unified backbone design removes multi-network brittleness from self-supervised NVS training, offering a cleaner scaling target.

SOURCE

https://huggingface.co/papers/2605.31535

← back to feed