🎙️ Mile High 2A (Talk) & 🪧 ExHall A 229-239 (Poster, 10am-12:00) • Colorado Convention Center • June 3rd, 2026

World Models
Meet Active Sensing
and Closed-Loop Planning

From passive generation to interactive agents that strategically decide what to sense, when to sense, and how to act.

the vision

Generative models have mastered passive generation. But real intelligence is active. It observes, plans, acts, and learns from feedback.

Active Sensing

Models that strategically choose what to observe—optimal viewpoints, sensor placement, information seeking.

Closed-Loop Planning

Continuous replanning based on new observations. Perception and action form a tight feedback loop.

Embodied Intelligence

Agents that learn through interaction. Active decision-making transforms passive models into interactive systems.

speakers

Leading voices in vision, robotics, and embodied AI

Nicholas Roy

Nicholas Roy

MIT CSAIL

08:10

"World Models and Why We Should Care about Their Structure"

Alan Yuille

Alan Yuille

Johns Hopkins University

09:35

"World Models: Bayes or Bust?"

Yiannis Aloimonos

Yiannis Aloimonos

University of Maryland

10:10

"Generative Action Systems"

Chelsea Finn

Chelsea Finn

Stanford University & Physical Intelligence

10:45

"Evaluating and Improving Robotic Foundation Models with World Models"

half-day workshop • 8am–12pm

Location: 🎙️ Mile High 2A (Talks) & 🪧 ExHall A 229-239 (Posters) at conference center

Time: June 3rd, 8 am - 11:50 am

08:00

Opening Welcome & Introductions

10 min

08:10

Invited Talk: Nicholas Roy

"World Models and Why We Should Care about Their Structure"

30 min

08:45

Oral Session 1 and 2: SAW-Bench and GEM-4D

30 min

09:20

☕ Coffee Break

15 min

09:35

Invited Talk: Alan Yuille

"World Models: Bayes or Bust?"

30 min

10:10

Invited Talk: Yiannis Aloimonos

"Generative Action Systems"

30 min

10:45

Invited Talk: Chelsea Finn

"Evaluating and Improving Robotic Foundation Models with World Models"

30 min

11:25

Oral Session 3: RoboWM-Bench

11:40

Closing Remarks

organizers

listed alphabetically by last name

Jieneng Chen

Jieneng Chen

JHU

Contact Person

Yilun Du

Yilun Du

Harvard

Cheng Peng

Cheng Peng

University of Virginia

Chen Wei

Chen Wei

Rice University

Jianwen Xie

Jianwen Xie

Lambda

Organizing & Onsite Committee

accepted papers

Poster session at ExHall A 229–239 (each board has two faces a/b) • June 3rd, 10:00 am – 12:00 • OpenReview portal

Poster ID Paper Title
229aWhen Predicted Depth Can Beat the Sensor: Depth-Free Deployment of RGB-D Self-Supervised Encoders
229bReconstruction or Semantics? What Makes a Latent Space Useful for Robotic World Models (PDF)
230aAdding Thermal Awareness to Visual Systems in Real-Time via Distilled Diffusion Models (PDF)
230bGEM-4D: Geometry-Enhanced Video World Models for Robot Manipulation
231aStreaming3D: Sequential 3D Generation via Evidential Memory
231bPurposive Sensing: Task-Aligned Observation Selection via Closed-Loop World Model Imagination
232aTowards World Scene Graph Generation from Monocular Videos: A Structured World Representation for Embodied Agents (PDF)
232bULTRA: Unified Multimodal Control for Autonomous Humanoid Whole-Body Loco-Manipulation (PDF)
233aRoboWM-Bench: A Benchmark for Evaluating World Models in Robotic Manipulation
233bEpistemic Horizons: Uncertainty-Gated Active Sensing for Closed-Loop World Model Planning
234aImitation learning through imagination in latent space (PDF)
234bWhen to Look: A Theory of Observation Timing for World-Model-Guided Active Agents
235aThe Information Gap Process: A Unified Theory of Closed-Loop Active Sensing in World Models
235bLatent Observability in World Models: A Unified Framework for Active Sensing, Belief Convergence, and Closed-Loop Planning Efficiency
236aEgoControl: Controllable Egocentric Video Generation via 3D Full-Body Poses (PDF)
236bSAW-Bench: Learning Situated Awareness in the Real World
237aWorldReel: 4D Video Generation with Consistent Geometry and Motion Modeling
237bTurning Video Models into Generalist Robot Policies
238aSame Meaning, Different Pictures: Finding Missing Generated Pictures (PDF)
238bAddressable Memory for Closed-Loop Video World Models (PDF)

get in touch

Questions about the workshop, submissions, or anything else? Reach out to our contact person.

Jieneng Chen • jchen293@jh.edu