Toward Compositional Generalization in Object‑Oriented World Modeling

ICML 2022 Long Presentation

Linfeng Zhao¹, Lingzhi Kong¹, Robin Walters¹, Lawson L.S. Wong¹

¹Khoury College of Computer Sciences, Northeastern University

ICML Page arXiv Code

Object replacement operations and their effects on the environment

HOWM architecture with Action Attention and Aligned Loss

Experimental results showing compositional generalization

Additional experimental results and analysis

Object Library environment showing different object configurations

Object replacement operations and their effects on the environment

HOWM architecture with Action Attention and Aligned Loss

Experimental results showing compositional generalization

Additional experimental results and analysis

Object Library environment showing different object configurations

Object replacement operations and their effects on the environment

HOWM architecture with Action Attention and Aligned Loss

Experimental results showing compositional generalization

Additional experimental results and analysis

Object Library environment showing different object configurations

Object replacement operations and their effects on the environment

HOWM architecture with Action Attention and Aligned Loss

Abstract

Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.

Background: World Modeling

World modeling is fundamental to reinforcement learning, where we aim to learn a transition model $T : S \times A \to S$ that can predict future states based on current states and actions. This forms the foundation for planning and decision-making in complex environments.

World models learn to predict state transitions in environments. Here we focus on environments with multiple interacting objects.

Object Library Environment

We introduce Object Library, a conceptual environment for studying compositional generalization in world modeling. In each episode, $K$ objects are drawn from a library of $N$ objects ( $L = {o_{1}, \dots, o_{N}}$ ), where $K < N$ . This allows us to generate different scenes with different object combinations.

In Object Library, K=2 objects are selected from N=4 possible objects, creating 6 possible scenes. Each object has its own state and action space.

Object Replacement and Generalization

A key concept is object replacement operations, which allow us to study how models generalize to new combinations of known objects. We formalize this using permutation symmetry $Σ_{N}$ acting on the object library.

We measure compositional generalization through equivariance error $λ$ , which quantifies how well the model's predictions respect object replacement operations.

Technical Approach

The Binding Problem

A fundamental challenge in object-oriented world modeling is the binding problem: how to correctly associate actions with objects in a consistent way across different scenes, especially when object positions and roles can vary.

The binding problem arises when trying to map object-relative actions (like "move right") to absolute actions in the world frame.

Slot MDP Framework

We introduce the Slot MDP framework $M_{[K]}$ as a structured representation where $K$ slots can represent $K$ objects. This provides a canonical way to organize object-centric information.

The Slot MDP $M_{[K]}$ provides $K$ slots with consistent meaning, allowing us to map scene MDPs $M_{Θ}$ to a canonical representation.

Solution Approaches

We explore different approaches to achieve compositional generalization:

A naive approach using fully connected GNN requires $O (N^{2})$ connections, becoming impractical for large numbers of objects.

Our slot-based GNN solution achieves $O (K^{2})$ complexity by operating on a fixed number of slots, making it scalable to large object libraries since $K ≪ N$ .

Solving the Binding Problem

Our key insight is to learn the binding between objects and slots through attention mechanisms, while ensuring consistency through aligned loss.

HOWM learns to bind objects to slots consistently across time steps using attention mechanisms and aligned loss, enabling compositional generalization.

Theoretical Guarantees

We prove that if a model achieves zero equivariance error in the slot MDP and satisfies the projection property, it guarantees compositional generalization.

Our theorem establishes that equivariance error in $M_{[K]}$ is proportional to the original MDP's error by a factor of $(\binom{N}{K})$ , providing theoretical bounds on generalization.

Results

Our HOWM model achieves efficient compositional generalization through:

Learned binding between actions and object slots
Efficient slot-based architecture
Theoretically-grounded training objectives

Experimental results showing improved compositional generalization compared to baselines.

Analysis of model behavior and computational efficiency.

Related Work

Our work builds on and relates to several research areas:

Compositional generalization in natural language (Lake & Baroni, 2018)
Object-centric world models (Burgess et al., 2019)
MDP homomorphisms and symmetries (van der Pol et al., 2020)

BibTeX

@article{zhao2022compositional,
  title={Toward Compositional Generalization in Object-Oriented World Modeling},
  author={Zhao, Linfeng and Kong, Lingzhi and Walters, Robin and Wong, Lawson LS},
  journal={ICML},
  year={2022}
}