[seminar] Making Robots See and Manipulate

김범준 교수님의 세미나 Making Robots See and Manipulate 내용을 기록했습니다.

  • Continuous motion level reasoning: Feasibility check가 필수적이며, 이는 computation expensive.
    • Idea: Learn to guide Planning \(\rightarrow \red{\text{MCTS+RL}}\)
      • Tree search + Value function, policy to guide the search.
      • 어떠한 물체를 어떻게, 어디로 옮겨야 하는지 geometric reasoning에 기반한 planning을 수행함.

그러나 real world에서 로봇을 연구해보니, perceive, manipulate 하는 기본적인 능력이 전혀 없다.

  • General Purpose Robot 연구를 위한 필수 요소
    1. \[\red{\text{Perceive and Manipulate Object}}\]
    2. Solve Long-horizon sequential
    3. Add Semantic, Common sense

Today’s topic is the first thing.

  • 교수님의 보통 아이디어 building: 큰 문제 \(\rightarrow\) 작은 문제로 나눔.
    1. Limited action repertorie
    2. Representation and perception - How do I represent obejct states?
    3. Big data for robotics - How do we efficiently generate one for robots?

그러나 많은 manipulation 연구가 Pick-n-Place라는 skill에만 국한됨; Prehensile manipulation에 치중되어 있음.

  • Intuition: Not all objects are graspable.
  • Previous approaches: Physics modeling + Planning으로 해결함.
    • Limitation:
      • Estimating the properties from RGB images is extremely difficult.
      • Modeling contact is still an active area of research. They make simplifying assumptions.
      • Planning trajecories take significant amount of time.

1. Limited action repertorie: Non-Prehensile Tasks

Manipulation System

Pre and Post-Contact Policy Decomposition for Non-Prehensile Manipulation with Zero-Shot Sim-To-Real Transfer: IROS 2023 paper;

  • 너무 크거나 너무 납작한 물체를 밀어서 Pose를 조정함.
  • 단차가 있는 벽 위로 물체를 옮겨야 할 때에.

Limitation:

  • requires a lot of data: isaac sim
  • Exploration is extremly hard for non-prehensile manipulation;
  • 기존의 Task definition; The manipulatee is always in close proximity to the manipulator
  • Contact inducing reward를 추가할 수 있음. 다만, 잘 설계해야 함. It may make an ineffective contact.

Approach:

  • Divided into two stages: 1. Pre-contact phase / 2. Post-contact phase; Tow distint policies.
  • Pre-contact policy Action space; 물체 위의 어느 point에 놓을 것인가, contact point에 대한 RL
  • Post-contact policy Action space; Target end-effector pose (Time-varying Impedance control)
Whole-Body Manipulation

How to learn Simultaneout balancing and manipulation

  • Hierarchical policy decomposition + curriculum leraning (이전에는 Series로 수행되었음.)

Lesson learned:

  • Modularity가 중요하다. 이것이 more efficient learning을 가능하게 함.
  • Manipulator에서는 Action space를 따로 정의하는 것이 Exploration에서 더 효율적이었으며, Debug 과정에서 수월함.

2. Representation and perception - How do I represent obejct states?

움직이는 motion 자체가 너무 느리다. Hardware 자체적인 성능도 아직은 너무 뒤떨어진다. 훨신 빠르고 Dynamic하게 + Learning purposed에 맞춰서 제작하고자 함.

Intuition; How do I represent object states?

  • Setup: Three cameras.
  • Estimating the 3D Spatial occupancy is important; Encoder of a Shape completion algorithm

  • 어떠한 Signal이 [high/Low]-value representation에 영향을 끼치는가?
    • Contact presence와 Loaction이 매우 중요함.

CORN: Contact-based Object Representation

  • Patch Transformer
    • estimated shape \(\rightarrow\) RRT + Grasping (Contact-based)

3. Big data for robotics - How do we efficiently generate one for robots?

  • Big data in simulator:
    • But, Collision Detection이 Non-convex object에 대해서 too slow
      • Contact Detection in the simulator is too slow for non-convex object.
    • GJK cannot leverage the parallel compuation.
    • Shape encoder \(\rightarrow\) Collision Predictor; A lot of 3D assets to train this.
  • Contribution: Local similarity.
    • Contact이 Local geometric에서는 매우 비슷한 양상을 보임.

질문:

Q. 흡착형은 어때요? A. 오염이 자주 됨.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • [paper-review] 6-DOF GraspNet: Variational Grasp Generation for Object Manipulation
  • [paper-review] Dream2Real: Zero-Shot 3D Object Rearrangement with Vision-Language Models
  • [paper-review] Reactive Base Control for On-The-Move Mobile Manipulation in Dynamic Environments
  • [study] Vector Quantization
  • [paper-review] MOKA: Open-Vocabulary Robotic Manipulation through Mark-Based Visual Prompting