🎙️ Mile High 2A (Talk) & 🪧 ExHall A 229-239 (Poster, 10am-12:00) • Colorado Convention Center • June 3rd, 2026

World Models

Meet Active Sensing

and Closed-Loop Planning

From passive generation to interactive agents that strategically decide what to sense, when to sense, and how to act.

accepted papers view schedule

the vision

Generative models have mastered passive generation. But real intelligence is active. It observes, plans, acts, and learns from feedback.

Active Sensing

Models that strategically choose what to observe—optimal viewpoints, sensor placement, information seeking.

Closed-Loop Planning

Continuous replanning based on new observations. Perception and action form a tight feedback loop.

Embodied Intelligence

Agents that learn through interaction. Active decision-making transforms passive models into interactive systems.

speakers

Leading voices in vision, robotics, and embodied AI

Nicholas Roy

Nicholas Roy

MIT CSAIL

08:10

"World Models and Why We Should Care about Their Structure"

Alan Yuille

Alan Yuille

Johns Hopkins University

09:35

"World Models: Bayes or Bust?"

Yiannis Aloimonos

Yiannis Aloimonos

University of Maryland

10:10

"Generative Action Systems"

Chelsea Finn

Chelsea Finn

Stanford University & Physical Intelligence

10:45

"Evaluating and Improving Robotic Foundation Models with World Models"

half-day workshop • 8am–12pm

Location: 🎙️ Mile High 2A (Talks) & 🪧 ExHall A 229-239 (Posters) at conference center

Time: June 3rd, 8 am - 11:50 am

08:00

Opening Welcome & Introductions

10 min

08:10

Invited Talk: Nicholas Roy

"World Models and Why We Should Care about Their Structure"

30 min

08:45

Oral Session 1 and 2: SAW-Bench and GEM-4D

30 min

09:20

☕ Coffee Break

15 min

09:35

Invited Talk: Alan Yuille

"World Models: Bayes or Bust?"

30 min

10:10

Invited Talk: Yiannis Aloimonos

"Generative Action Systems"

30 min

10:45

Invited Talk: Chelsea Finn

"Evaluating and Improving Robotic Foundation Models with World Models"

30 min

11:25

Oral Session 3: RoboWM-Bench

11:40

Closing Remarks

organizers

listed alphabetically by last name

Rama Chellappa

Rama Chellappa

JHU

Jieneng Chen

Jieneng Chen

JHU

Contact Person

Yilun Du

Yilun Du

Harvard

Sanjeev Khudanpur

Sanjeev Khudanpur

JHU

Cheng Peng

Cheng Peng

University of Virginia

Tianmin Shu

Tianmin Shu

JHU

Chen Wei

Chen Wei

Rice University

Jianwen Xie

Jianwen Xie

Lambda

Alan Yuille

Alan Yuille

JHU

Organizing & Onsite Committee

Jieneng Chen

Jieneng Chen

JHU

Jiahan Zhang

Jiahan Zhang

JHU

Qi Chen

Qi Chen

JHU

accepted papers

Poster session at ExHall A 229–239 (each board has two faces a/b) • June 3rd, 10:00 am – 12:00 • OpenReview portal

Poster ID	Paper Title
229a	When Predicted Depth Can Beat the Sensor: Depth-Free Deployment of RGB-D Self-Supervised Encoders
229b	Reconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models (PDF)
230a	Adding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models (PDF)
230b	GEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation
231a	Streaming3D: Sequential 3D Generation via Evidential Memory
231b	Purposive Sensing: Task-Aligned Observation Selection via Closed-Loop World Model Imagination
232a	Towards World Scene Graph Generation from Monocular Videos: A Structured World Representation for Embodied Agents (PDF)
232b	ULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation (PDF)
233a	RoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
233b	Epistemic Horizons: Uncertainty-Gated Active Sensing for Closed-Loop World Model Planning
234a	Imitation learning through imagination in latent space (PDF)
234b	When to Look: A Theory of Observation Timing for World-Model-Guided Active Agents
235a	The Information Gap Process: A Unified Theory of Closed-Loop Active Sensing in World Models
235b	Latent Observability in World Models: A Unified Framework for Active Sensing, Belief Convergence, and Closed-Loop Planning Efficiency
236a	EgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses (PDF)
236b	SAW-Bench: Learning Situated Awareness in the Real World
237a	WorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling
237b	Turning Video Models into Generalist Robot Policies
238a	Same Meaning, Different Pictures: Finding Missing Generated Pictures (PDF)
238b	Addressable Memory for Closed-Loop Video World Models (PDF)

get in touch

Questions about the workshop, submissions, or anything else? Reach out to our contact person.

Jieneng Chen • jchen293@jh.edu