Vision-Language Models

ThinkGrasp: A Vision-Language System for Strategic Part Grasping in Clutter

We have developed ThinkGrasp, a plug-and-play vision-language grasping system for heavy clutter environment grasping strategies.

Learning Efficient and Robust Language‑conditioned Manipulation using Textual‑Visual Relevancy and Equivariant Language Mapping

We develop an approach for efficient open-vocabulary language-conditioned manipulation policy learning.