Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning

Abstract

World models promise a paradigm shift in robotics, where an agent learns the underlying physics of its environment once to enable efficient planning and behavior learning. However, current world models are often hardware-locked specialists: a model trained on a Boston Dynamics Spot robot fails catastrophically on a Unitree Go1 due to the mismatch in kinematic and dynamic properties, as the model overfits to specific embodiment constraints rather than capturing the universal locomotion dynamics. Consequently, a slight change in actuator dynamics or limb length necessitates training a new model from scratch. In this work, we take a step towards a framework for training a generalizable Quadrupedal World Model (QWM) that disentangles environmental dynamics from robot morphology. We address the limitations of implicit system identification, where treating static physical properties (like mass or limb length) as latent variables to be inferred from motion history creates an adaptation lag that can compromise zero-shot safety and efficiency. Instead, we explicitly condition the generative dynamics on the robot's engineering specifications. By integrating a physical morphology encoder and a reward normalizer, we enable the model to serve as a neural simulator capable of generalizing across morphologies. This capability unlocks zero-shot control across a range of embodiments. Since the policy is conditioned on generalizable latent dynamics provided by the world model, we can deploy the agent on entirely unseen quadrupeds without fine-tuning, adaptation, or warm-up periods. We introduce, for the first time, a world model that enables zero-shot generalization to new morphologies for locomotion. While we carefully study the limitations of our method, QWM operates as a distribution-bounded interpolator within the quadrupedal morphology family rather than a universal physics engine, this work represents a significant step toward morphology-conditioned world models for legged locomotion.

Overview

Heterogeneous robot cohort — The heterogeneous morphology cohort used in experiments, illustrating the variance in physical scale and configuration. QWM is trained on seven robots while holding out one for zero-shot evaluation.

Method

QWM extends DreamerV3 with three targeted architectural changes to handle cross-morphology generalization:

Physical Morphology Encoder (PME): Extracts normalized features across four categories: kinematics & topology (hip offset, limb lengths, knee configuration), geometry (stance dimensions), dynamics (log-scaled mass), and actuation (torque density). Processed by a dedicated 2-layer MLP that runs parallel to the proprioceptive encoder, preventing static context from being overwhelmed by dynamic signals.

Morphology-Conditioned Recurrent Dynamics: The morphology embedding $\mu$ is injected at every recurrent step: $h_t = f(h_{t-1}, z_{t-1}, a_{t-1}, \mu)$. This allows the recurrent state to focus on dynamic execution while explicit conditioning handles static embodiment properties.

Adaptive Reward Normalizer (ARN): Quantile-based scaling using exponential moving averages tracks per-robot reward distributions, dynamically normalizing heterogeneous reward signals so no single morphology dominates training.

Training QWM required running eight different robot morphologies in parallel within a single simulator, something Isaac Lab does not support out of the box. To enable this, we built Hetero-Isaac, an extension to NVIDIA Isaac Lab that assigns distinct robot morphologies, collision geometries, and kinematic trees to different environment subsets while keeping all physics fidelity intact. The full technical details of this infrastructure, including joint-order unification, index mapping, and padded reward functions, are described in the accompanying blog post: Heterogeneous Environments in Isaac Lab.

Real-World Experiments

Both Unitree Go1 and ANYmal-D were held out during training. By injecting the correct morphology embedding, the frozen policy achieves stable locomotion on both platforms with zero falls across 20 trials (10 per platform, 60 seconds each).

ANYmal-D: zero-shot, held out during training

Unitree Go1: zero-shot, held out during training

Hetero-Isaac: 8 robots training in parallel

Open-loop imagination rollouts vs. ground truth physics

Multi-Morphology Mastery

A single QWM is trained simultaneously on the full heterogeneous cohort of eight quadrupeds and compared against world model baselines (DreamerV3, PWM, TWISTER) as well as a model-free oracle (PME-PPO).

Learning curves comparing QWM against baselines on heterogeneous robot cohort — Learning curves comparing QWM against baselines trained simultaneously on the full heterogeneous cohort. Left: mean reward. Right: mean episode length. Shaded regions are standard deviation across 5 seeds.

BibTeX

@misc{danesh2026qwm, title = {Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning}, author = {Danesh, Mohamad H. and Li, Chenhao and Abyaneh, Amin and Houssaini, Anas and Ellis, Kirsty and Berseth, Glen and Hutter, Marco and Lin, Hsiu-Chin}, year = {2026}, eprint = {2604.08780}, archivePrefix = {arXiv}, primaryClass = {cs.RO}, url = {https://arxiv.org/abs/2604.08780} }