Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.
World modeling is fundamental to reinforcement learning, where we aim to learn a transition model
World models learn to predict state transitions in environments. Here we focus on environments with multiple interacting objects.
We introduce Object Library, a conceptual environment for studying compositional generalization in world modeling. In each episode,
In Object Library, K=2 objects are selected from N=4 possible objects, creating 6 possible scenes. Each object has its own state and action space.
A key concept is object replacement operations, which allow us to study how models generalize to new combinations of known objects. We formalize this using permutation symmetry
We measure compositional generalization through equivariance error
A fundamental challenge in object-oriented world modeling is the binding problem: how to correctly associate actions with objects in a consistent way across different scenes, especially when object positions and roles can vary.
The binding problem arises when trying to map object-relative actions (like "move right") to absolute actions in the world frame.
We introduce the Slot MDP framework
The Slot MDP
We explore different approaches to achieve compositional generalization:
A naive approach using fully connected GNN requires
Our slot-based GNN solution achieves
Our key insight is to learn the binding between objects and slots through attention mechanisms, while ensuring consistency through aligned loss.
HOWM learns to bind objects to slots consistently across time steps using attention mechanisms and aligned loss, enabling compositional generalization.
We prove that if a model achieves zero equivariance error in the slot MDP and satisfies the projection property, it guarantees compositional generalization.
Our theorem establishes that equivariance error in
Our HOWM model achieves efficient compositional generalization through:
Experimental results showing improved compositional generalization compared to baselines.
Analysis of model behavior and computational efficiency.
Our work builds on and relates to several research areas:
@article{zhao2022compositional,
title={Toward Compositional Generalization in Object-Oriented World Modeling},
author={Zhao, Linfeng and Kong, Lingzhi and Walters, Robin and Wong, Lawson LS},
journal={ICML},
year={2022}
}