Toward Hardware-Agnostic Quadrupedal World Models
via Morphology Conditioning
arXiv 2026
Mohamad H. Danesh  ·  Chenhao Li  ·  Amin Abyaneh  ·  Anas Houssaini
Kirsty Ellis  ·  Glen Berseth  ·  Marco Hutter  ·  Hsiu-Chin Lin
McGill University  ·  ETH Zürich  ·  Université de Montréal / Mila

Abstract

World models trained on one quadrupedal platform typically fail on different hardware due to morphological differences in mass, link dimensions, and kinematic configuration. We present QWM, a framework that enables a single neural dynamics model to generalize across diverse quadrupedal robots without retraining. The key innovation is to explicitly condition the generative dynamics on the robot's engineering specifications — extracted directly from URDF/USD files — rather than inferring physical properties implicitly from interaction history. A Physical Morphology Encoder (PME) derives a compact embedding from kinematic, geometric, dynamic, and actuation features, which is injected into every recurrent step of a DreamerV3-based world model. An Adaptive Reward Normalizer (ARN) handles heterogeneous reward scales across platforms. We further introduce Hetero-Isaac, an extension to NVIDIA Isaac Lab enabling true heterogeneous training across different morphologies in parallel. QWM achieves zero-shot locomotion on unseen robots — including real-world deployment on Unitree Go1 and ANYmal-D — with performance approaching per-robot specialists, while eliminating the dangerous adaptation lag of implicit system identification approaches.

Overview

QWM framework overview
Figure 1. Overview of the QWM framework. Left (WM Learning): A single world model is trained across diverse morphologies. The Physical Morphology Encoder (PME) derives a static embedding μ from each robot's USD file, which explicitly conditions both the encoder and the recurrent dynamics. Right (Policy Learning): A policy is trained purely in imagination and deployed zero-shot on real hardware.
Heterogeneous robot cohort
Figure 2. The heterogeneous morphology cohort used in experiments, illustrating the variance in physical scale and configuration. QWM is trained on seven robots while holding out one for zero-shot evaluation.

Method

QWM extends DreamerV3 with three targeted architectural changes to handle cross-morphology generalization:

Physical Morphology Encoder (PME) — Extracts normalized features across four categories: kinematics & topology (hip offset, limb lengths, knee configuration), geometry (stance dimensions), dynamics (log-scaled mass), and actuation (torque density). Processed by a dedicated 2-layer MLP that runs parallel to the proprioceptive encoder, preventing static context from being overwhelmed by dynamic signals.
Morphology-Conditioned Recurrent Dynamics — The morphology embedding μ is injected at every recurrent step: ht = f(ht−1, zt−1, at−1, μ). This allows the recurrent state to focus on dynamic execution while explicit conditioning handles static embodiment properties.
Adaptive Reward Normalizer (ARN) — Quantile-based scaling using exponential moving averages tracks per-robot reward distributions, dynamically normalizing heterogeneous reward signals so no single morphology dominates training.

Real-World Experiments

Both Unitree Go1 and ANYmal-D were held out during training. By injecting the correct morphology embedding, the frozen policy achieves stable locomotion on both platforms with zero falls across 20 trials (10 per platform, 60 seconds each).

Real-world deployment on Unitree Go1 and ANYmal-D
Figure 5. Real-world deployment on Unitree Go1 and ANYmal-D. Both robots were held out during training. The frozen policy achieves stable zero-shot locomotion by simply injecting the correct morphology embedding μ.

Videos from real-world experiments:

ANYmal-D Deployment
Zero-shot locomotion on held-out platform

(video coming soon)
Unitree Go1 Deployment
Zero-shot locomotion on held-out platform

(video coming soon)
Multi-Robot Training
Hetero-Isaac: 8 robots training in parallel

(video coming soon)
Open-Loop Imagination
Long-horizon dynamics prediction rollouts

(video coming soon)

Results

Learning curves on heterogeneous robot cohort
Figure 3. Learning curves comparing QWM against world model baselines (DreamerV3, PWM, TWISTER) trained simultaneously on the full heterogeneous cohort. QWM achieves significantly faster convergence and higher stability. Shaded regions are standard deviation across 5 seeds.
Long-horizon dynamics prediction
Figure 4. Long-horizon dynamics prediction. Left: Open-loop imagination rollouts vs. ground truth physics. QWM maintains tight synchronization across diverse scales. Right: Normalized Mean Squared Error (NMSE) over a 45-step horizon — QWM consistently outperforms baselines with minimal error accumulation.
Ablation study
Figure 7. Ablation study isolating contributions of PME, ARN, and conditioning locations.
PCA of QWM latent states
Figure 8. PCA of QWM latent states — morphology clusters (a) vs. dynamic state gradients (b–e), showing the model cleanly separates embodiment identity from locomotion dynamics.
Morphological feature distance matrix
Figure 6. Morphological Feature Distance Matrix. Euclidean distances between z-score standardized PME features, showing the encoder correctly identifies physical families (e.g., ANYmal variants cluster together).

Zero-Shot Generalization

QWM is evaluated in two generalization regimes:

  • Morphological Interpolation (within training distribution): Unitree Go1 achieves 974.4 ± 6.2 episode length vs. 996.1 ± 1.1 for the specialist. ANYmal-D achieves 948.6 ± 12.1 vs. 981.3 ± 4.2.
  • Real-World Transfer: ANYmal-D reaches 0.30 m/s linear tracking error vs. 0.28 for specialist; Go1 reaches 0.34 vs. 0.31. Zero falls across all 20 trials.
  • Morphological Extrapolation (out of distribution): Performance degrades for geometric outliers (e.g., Unitree B2), confirming QWM acts as a distribution-bounded interpolator — a universal physics engine requires cohorts that span the full parameter space.

BibTeX

@misc{danesh2026qwm, title = {Toward Hardware-Agnostic Quadrupedal World Models via Morphology Conditioning}, author = {Danesh, Mohamad H. and Li, Chenhao and Abyaneh, Amin and Houssaini, Anas and Ellis, Kirsty and Berseth, Glen and Hutter, Marco and Lin, Hsiu-Chin}, year = {2026}, eprint = {2604.08780}, archivePrefix = {arXiv}, primaryClass = {cs.RO}, url = {https://arxiv.org/abs/2604.08780} }