Object Library environment showing different object configurations
Object replacement operations and their effects on the environment
HOWM architecture with Action Attention and Aligned Loss
Experimental results showing compositional generalization
Additional experimental results and analysis
Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.
World modeling is fundamental to reinforcement learning, where we aim to learn a transition model \(\mathcal{T}: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}\) that can predict future states based on current states and actions. This forms the foundation for planning and decision-making in complex environments.
World models learn to predict state transitions in environments. Here we focus on environments with multiple interacting objects.
We introduce Object Library, a conceptual environment for studying compositional generalization in world modeling. In each episode, \(K\) objects are drawn from a library of \(N\) objects (\(\mathbb{L} = \{o_1, \ldots, o_N\}\)), where \(K < N\). This allows us to generate different scenes with different object combinations.
In Object Library, K=2 objects are selected from N=4 possible objects, creating 6 possible scenes. Each object has its own state and action space.
A key concept is object replacement operations, which allow us to study how models generalize to new combinations of known objects. We formalize this using permutation symmetry \(\Sigma_N\) acting on the object library.
We measure compositional generalization through equivariance error \(\lambda\), which quantifies how well the model's predictions respect object replacement operations.
A fundamental challenge in object-oriented world modeling is the binding problem: how to correctly associate actions with objects in a consistent way across different scenes, especially when object positions and roles can vary.
The binding problem arises when trying to map object-relative actions (like "move right") to absolute actions in the world frame.
We introduce the Slot MDP framework \(\mathcal{M}_{[K]}\) as a structured representation where \(K\) slots can represent \(K\) objects. This provides a canonical way to organize object-centric information.
The Slot MDP \(\mathcal{M}_{[K]}\) provides \(K\) slots with consistent meaning, allowing us to map scene MDPs \(\mathcal{M}_\Theta\) to a canonical representation.
We explore different approaches to achieve compositional generalization:
A naive approach using fully connected GNN requires \(\mathcal{O}(N^2)\) connections, becoming impractical for large numbers of objects.
Our slot-based GNN solution achieves \(\mathcal{O}(K^2)\) complexity by operating on a fixed number of slots, making it scalable to large object libraries since \(K \ll N\).
Our key insight is to learn the binding between objects and slots through attention mechanisms, while ensuring consistency through aligned loss.
HOWM learns to bind objects to slots consistently across time steps using attention mechanisms and aligned loss, enabling compositional generalization.
We prove that if a model achieves zero equivariance error in the slot MDP and satisfies the projection property, it guarantees compositional generalization.
Our theorem establishes that equivariance error in \(\mathcal{M}_{[K]}\) is proportional to the original MDP's error by a factor of \(\binom{N}{K}\), providing theoretical bounds on generalization.
Our HOWM model achieves efficient compositional generalization through:
Experimental results showing improved compositional generalization compared to baselines.
Analysis of model behavior and computational efficiency.
Our work builds on and relates to several research areas:
@article{zhao2022compositional,
title={Toward Compositional Generalization in Object-Oriented World Modeling},
author={Zhao, Linfeng and Kong, Lingzhi and Walters, Robin and Wong, Lawson LS},
journal={ICML},
year={2022}
}