We propose a method for manipulating diverse objects across a wide range of initial and goal configurations and camera placements. We disentangle the standard image-to-action mapping into two separate modules: (1) a behavior selector which conditions on intrinsic and semantically-rich object appearance features to select the behaviors that can successfully perform the desired tasks on the object in hand, and (2) a library of behaviors each of which conditions on extrinsic and abstract object properties to predict actions to execute over time. Our framework outperforms various learning and non-learning based baselines in both simulated and real robot tasks.
Given an input RGB-D image $I_v$ of the scene and an object 3D bounding box $\mathrm{o}$, the selector $\mathrm{G}$ predicts the probability of successfully manipulating the object when applying each behavior on the object. This is done by using geometry-aware recurrent neural networks (GRNNs) to convert the image to a 3D scene representation $\mathbf{M}'$, cropping the representation to the object using the provided 3D bounding box, and computing the cosine simularity between this object representation $\mathbf{F}(I_v, \mathrm{o}; \phi)$ and learned behavioral keys $\kappa_i$. for each behavior $\pi_i$ in the library. The behavior with the highest predicted success probability is then executed in the environment. We train the selector using interaction labels collected by running behaviors on random training objects and recording the binary success or failure outcomes of the executions.
Our method out-performs various baselines and ablations in simulated pushing and grasping tasks.
We also perform real robot experiment in a setup where the robot needs to execute various skills to transport diverse rigid, granular, and liquid objects to a plate.
@inproceedings{yang2021visually,
title={Visually-Grounded Library of Behaviors for Manipulating Diverse Objects across Diverse Configurations and Views},
author={Yang, Jingyun and Tung, Hsiao-Yu and Zhang, Yunchu and Pathak, Gaurav and Pokle, Ashwini and Atkeson, Christopher G and Fragkiadaki, Katerina},
booktitle={5th Annual Conference on Robot Learning},
year={2021}
}