●builderYou can use this architecture to add predictive foresight to VLA-based robot policies without the inference cost of video generation.
●researcherThe latent action model trained in foundation model embedding space is a practical alternative to expensive video prediction for policy foresight.