Toward Compositional Generalization in Object‑Oriented World Modeling

ICML 2022 Long Presentation

1Khoury College of Computer Sciences, Northeastern University

Abstract

Compositional generalization is a critical ability in learning and decision-making. We focus on the setting of reinforcement learning in object-oriented environments to study compositional generalization in world modeling. We (1) formalize the compositional generalization problem with an algebraic approach and (2) study how a world model can achieve that. We introduce a conceptual environment, Object Library, and two instances, and deploy a principled pipeline to measure the generalization ability. Motivated by the formulation, we analyze several methods with exact or no compositional generalization ability using our framework, and design a differentiable approach, Homomorphic Object-oriented World Model (HOWM), that achieves approximate but more efficient compositional generalization.

Background: World Modeling

World modeling is fundamental to reinforcement learning, where we aim to learn a transition model \(\mathcal{T}: \mathcal{S} \times \mathcal{A} \rightarrow \mathcal{S}\) that can predict future states based on current states and actions. This forms the foundation for planning and decision-making in complex environments.

World Model Animation

World models learn to predict state transitions in environments. Here we focus on environments with multiple interacting objects.

Object Library Environment

We introduce Object Library, a conceptual environment for studying compositional generalization in world modeling. In each episode, \(K\) objects are drawn from a library of \(N\) objects (\(\mathbb{L} = \{o_1, \ldots, o_N\}\)), where \(K < N\). This allows us to generate different scenes with different object combinations.

Object Library Animation

In Object Library, K=2 objects are selected from N=4 possible objects, creating 6 possible scenes. Each object has its own state and action space.

Object Replacement and Generalization

A key concept is object replacement operations, which allow us to study how models generalize to new combinations of known objects. We formalize this using permutation symmetry \(\Sigma_N\) acting on the object library.

Measuring Compositional Generalization

We measure compositional generalization through equivariance error \(\lambda\), which quantifies how well the model's predictions respect object replacement operations.

Technical Approach

The Binding Problem

A fundamental challenge in object-oriented world modeling is the binding problem: how to correctly associate actions with objects in a consistent way across different scenes, especially when object positions and roles can vary.

Binding Problem

The binding problem arises when trying to map object-relative actions (like "move right") to absolute actions in the world frame.

Slot MDP Framework

We introduce the Slot MDP framework \(\mathcal{M}_{[K]}\) as a structured representation where \(K\) slots can represent \(K\) objects. This provides a canonical way to organize object-centric information.

Slot MDP

The Slot MDP \(\mathcal{M}_{[K]}\) provides \(K\) slots with consistent meaning, allowing us to map scene MDPs \(\mathcal{M}_\Theta\) to a canonical representation.

Solution Approaches

We explore different approaches to achieve compositional generalization:

Naive Solution

A naive approach using fully connected GNN requires \(\mathcal{O}(N^2)\) connections, becoming impractical for large numbers of objects.

Efficient Solution

Our slot-based GNN solution achieves \(\mathcal{O}(K^2)\) complexity by operating on a fixed number of slots, making it scalable to large object libraries since \(K \ll N\).

Solving the Binding Problem

Our key insight is to learn the binding between objects and slots through attention mechanisms, while ensuring consistency through aligned loss.

Solving Binding

HOWM learns to bind objects to slots consistently across time steps using attention mechanisms and aligned loss, enabling compositional generalization.

Theoretical Guarantees

We prove that if a model achieves zero equivariance error in the slot MDP and satisfies the projection property, it guarantees compositional generalization.

Key Theorem

Our theorem establishes that equivariance error in \(\mathcal{M}_{[K]}\) is proportional to the original MDP's error by a factor of \(\binom{N}{K}\), providing theoretical bounds on generalization.

Results

Our HOWM model achieves efficient compositional generalization through:

  • Learned binding between actions and object slots
  • Efficient slot-based architecture
  • Theoretically-grounded training objectives
Results Part 1

Experimental results showing improved compositional generalization compared to baselines.

Results Part 2

Analysis of model behavior and computational efficiency.

Related Work

Our work builds on and relates to several research areas:

BibTeX

@article{zhao2022compositional,
  title={Toward Compositional Generalization in Object-Oriented World Modeling},
  author={Zhao, Linfeng and Kong, Lingzhi and Walters, Robin and Wong, Lawson LS},
  journal={ICML},
  year={2022}
}