VLMs excel at 2D surroundings, but the visual world is in 3D, and MindJourney provides better viewpoints of real-world scenarios, and ultimately aims to forecast how scenes change over time, according to the Microsoft researchers.
MindJourney “sketches a concise camera trajectory, while the world model synthesizes the corresponding view at each step. The VLM then reasons over this multi-view evidence gathered during the interactive exploration,” the researchers wrote in a paper.
MindJourney’s technologies could improve assistive robots and remote inspection, and enrich virtual and augmented reality experiences, the researchers wrote in the paper.
![Microsoft researchers develop new tech for video AI agents – Computerworld Oferta pracy w IT, ale dla agenta AI. Śmiać się czy płakać? [ZAROBKI]](https://zozoti.com/wp-content/uploads/2025/09/Microsoft-researchers-develop-new-tech-for-video-AI-agents-–.jpg)