[HUGGINGFACE]score: 0.55
RayDer Unifies Camera Estimation and NVS in One Transformer for Video Scaling
May 28, 2026
RayDer consolidates camera estimation, scene reconstruction, and rendering into a single feed-forward transformer, treating dynamic content as a nuisance factor absorbed by a minimal dynamic state rather than reconstructing it. This makes self-supervised novel view synthesis a single-model scaling problem trainable on unconstrained real-world video.
paper
HOW THIS AFFECTS YOU
●
builderWorth tracking if you're building 3D reconstruction or NVS pipelines that need to train on in-the-wild video without camera metadata.
●
researcherThe unified backbone design removes multi-network brittleness from self-supervised NVS training, offering a cleaner scaling target.