Robust Detection for Autonomous Elevator Boarding using a Mobile Manipulator

Korea University1

Ewha Womans University2

Robust Detection for Autonomous Elevator Boarding using a Mobile Manipulator is an approach to ...

Abstract

Indoor robots are becoming increasingly prevalent across a range of sectors, but the challenge of navigating multi-level structures through elevators remains largely uncharted. For a robot to operate successfully, it's pivotal to have an accurate perception of elevator states. This paper presents a robust robotic system, tailored to interact adeptly with elevators by discerning their status, actuating buttons, and boarding seamlessly. Given the inherent issues of class imbalance and limited data, we utilize the YOLOv7 model and adopt specific strategies to counteract the potential decline in object detection performance. Our method effectively confronts the class imbalance and label dependency observed in real-world datasets, Our method effectively confronts the class imbalance and label dependency observed in real-world datasets, offering a promising approach to improve indoor robotic navigation systems.

Framework

MY ALT TEXT

Overall pipeline of the proposed teleoperation framework. With the scene input, the system checks the physical stability of the correct placement. Then, the system verifies the contextually reasonable positions based on the receptacle reasoning step, considering the scene's context, and recommends the coordinates obtained from both processes to the user.

\( \textbf{SPOTS} \)

\( \textbf{Stability Verification} \)

We aim to identify regions where objects can be stably placed over a given interaction time \(T\) in simulation. More specifically, to determine the robustness of the placement stability, small perturbations are injected after the object has been placed. We define the set of points \( \mathcal{P}_{\text{s}} \) to represent coordinates where objects can be placed stably.

\( \textbf{Receptacle Reasoning} \)

Though the set of \( \mathcal{P}_{\text{s}} \) points, that are verified in \( \textbf{Stability Verification} \) step, the points are determined to be feasible, it may contain some options that do not consider the context of the scene. Therefore, we aim to analyze the reasonableness within the limited range of \( \mathcal{P}_{\text{s}} \) that corresponds to the current scene's situation and context.

User Interaction

User interaction with the proposed system SPOTS. The user selects among the candidates, provided with a close consideration of stability and reasonableness, in an interactive viewer. SPOTS recommends the placement candidates based on the prompt of the task.

Environments

Our real-to-sim transfer module, illustrated in Framework, utilizes OWL-ViT for open-vocabulary object detection and AprilTags for pose estimation, based on input from an RGBD vision sensor. The detected objects form a label super-set that includes nine categories of [1], for a total of 21 object assets. For each detected object, we assume the corresponding 3D asset is available. These assets are transferred into a simulation environment that mimics the real world as closely as possible. This reconstructed environment is the basis for all subsequent evaluations. The framework is built on the MuJoCo simulator, using assets from the YCB and Google Scanned dataset. We use a tabletop manipulation framework with a 6-DoF robot arm and gpt-3.5-turbo.

[1] 'DishRack', 'Bowl', 'BookShelf', 'Fruit', 'Beverage', 'Snack', 'Tray', 'Glass', 'Book'

Select an image below:
[Small Gap]: White Dish Rack
(a) Small Gap
[Medium Gap]: Black Dish Rack
(b) Medium Gap
[Large Gap]: Wood Dish Rack
(c) Large Gap
[Two-Tiered Bookshelf]
(d) Two-Tiered Bookshelf
[Three-Tiered Bookshelf]
(e) Three-Tiered Bookshelf
[Three-Tiered Shelf]
(f) Three-Tiered Shelf

Results

MY ALT TEXT

Result of stability verification module: Performed for all environments in our experiments, not including reasoning module. The ratio of stable coordinates to the total number of coordinates is very low. This indicates that the task we are assuming is physically difficult to be stably located.

MY ALT TEXT

We compare SPOTS to three prior methods: LLM-GROP [1], Code-as-Policies (CaP) [2], and Language-to-Reward (L2R) [3]. LLM-GROP uses two different template-based prompts; one extracts semantic relationships with examples, and the other one predicts geometric spatial relationships for varying scene geometry. CaP generates policy code for the robot motion using a pre-defined low-level primitive function. L2R defines reward parameters that can be optimized, and the reward function is designed for moving a manipulator to a parameterized placement position.

Our evaluation metrics are the place stability and reasonableness of the suggested object placements. The stability success rate is based purely on the physical stability of object placement in simulations, whether that object is placed stable (i.e., Sta. S/R). Reasonableness success rate (i.e., Rea. S/R), on the other hand, is based on whether object placement aligns with the ground truth that we define. Evaluating reasonableness success criteria is manually designed. These metrics assess the overall effectiveness of placements in ensuring both stability and reasonableness. These specific criteria are the ground truth for confirming appropriate locations in our experimental validation. Furthermore, we measure the time taken for the inference and the number of input and output tokens to measure the efficiency of utilizing LLMs.

By separating the tasks of predicting receptacles and ensuring physical robustness into two distinct modules, we find that SPOTS achieves a higher success rate while using fewer tokens compared to the methods that enforce LLMs to predict both robotic plans while understanding the context. From this experiment, we would like to posit that SPOTS has great capability of promptable placement tasks, which considers both physically stable and reasonable regions, and SPOTS has a good distribution, where reasonable positions can be sampled.

Real World Demonstration

In this experiment, we consider a scene with different objects placed on a desk. We designed a task that categorizes objects based on similarity. The reasoning criteria, termed similarity, varies for each experiment and serves as the ground truth for evaluating reasoning abilities. Each type of similarity was evaluated five times, and performance was measured using the overall success rate (i.e., both Sta. S/R and Rea. S/R). From this experiment, we insist that the reasonable place varies depending on the task description given as input. Furthermore, we are able to accurately determine the stable positions to place the objects by reconstructing the robot's ego-centric view with the real-to-sim method.

BibTeX


        @article{lee2023spots,
          title={SPOTS: Stable Placement of Objects with Reasoning in Semi-Autonomous Teleoperation Systems},
          author={Lee, Joonhyung and Park, Sangbeom and Park, Jeongeun and Lee, Kyungjae and Choi, Sungjoon},
          journal={arXiv preprint arXiv:2309.13937},
          year={2023}
        }