Pick-and-place is one of the fundamental tasks in robotics research. However, the attention has been mostly focused on the ``pick'' task, leaving the ``place'' task relatively unexplored. In this paper, we address the problem of placing objects in the context of a teleoperation framework. Particularly, we focus on two aspects of the place task: stability robustness and contextual reasonableness of object placements. Our proposed method combines simulation-driven physical stability verification via real-to-sim and the semantic reasoning capability of large language models. In other words, given place context information (e.g., user preferences, object to place, and current scene information), our proposed method outputs a probability distribution over the possible placement candidates, considering the robustness and reasonableness of the place task. Our proposed method is extensively evaluated in two simulation and one real world environments and we show that our method can greatly increase the physical plausibility of the placement as well as contextual soundness while considering user preferences.
Our real-to-sim transfer module, illustrated in Framework, utilizes OWL-ViT for open-vocabulary object detection and AprilTags for pose estimation, based on input from an RGBD vision sensor. The detected objects form a label super-set that includes nine categories of [1], for a total of 21 object assets. For each detected object, we assume the corresponding 3D asset is available. These assets are transferred into a simulation environment that mimics the real world as closely as possible. This reconstructed environment is the basis for all subsequent evaluations. The framework is built on the MuJoCo simulator, using assets from the YCB and Google Scanned dataset. We use a tabletop manipulation framework with a 6-DoF robot arm and gpt-3.5-turbo.
[1] 'DishRack', 'Bowl', 'BookShelf', 'Fruit', 'Beverage', 'Snack', 'Tray', 'Glass', 'Book'
We compare SPOTS to three prior methods: LLM-GROP [1], Code-as-Policies (CaP) [2], and Language-to-Reward (L2R) [3]. LLM-GROP uses two different template-based prompts; one extracts semantic relationships with examples, and the other one predicts geometric spatial relationships for varying scene geometry. CaP generates policy code for the robot motion using a pre-defined low-level primitive function. L2R defines reward parameters that can be optimized, and the reward function is designed for moving a manipulator to a parameterized placement position.
Our evaluation metrics are the place stability and reasonableness of the suggested object placements. The stability success rate is based purely on the physical stability of object placement in simulations, whether that object is placed stable (i.e., Sta. S/R). Reasonableness success rate (i.e., Rea. S/R), on the other hand, is based on whether object placement aligns with the ground truth that we define. Evaluating reasonableness success criteria is manually designed. These metrics assess the overall effectiveness of placements in ensuring both stability and reasonableness. These specific criteria are the ground truth for confirming appropriate locations in our experimental validation. Furthermore, we measure the time taken for the inference and the number of input and output tokens to measure the efficiency of utilizing LLMs.
By separating the tasks of predicting receptacles and ensuring physical robustness into two distinct modules, we find that SPOTS achieves a higher success rate while using fewer tokens compared to the methods that enforce LLMs to predict both robotic plans while understanding the context. From this experiment, we would like to posit that SPOTS has great capability of promptable placement tasks, which considers both physically stable and reasonable regions, and SPOTS has a good distribution, where reasonable positions can be sampled.
@article{lee2023spots,
title={SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems},
author={Lee, Joonhyung and Park, Sangbeom and Park, Jeongeun and Lee, Kyungjae and Choi, Sungjoon},
journal={arXiv preprint arXiv:2309.13937},
year={2023}
}