[paper-review] Statler: State-Maintaining Language Models for Embodied Reasoning

Arxiv. [Paper] [Project Page]

Takuma Yoneda^*1, Jiading Fang^*1, Peng Li^*2, Huanyu Zhang^*3, Tianchong Jiang³, Shengjie Lin¹, Ben Picker³, David Yunis¹ Hongyuan Mei¹, Matthew R. Walter¹ > ¹TTI-Chicago, ²Fudan University, ³University of Chicago, *Equal Contribution

Dec. 4

Fig. 1: Overview of Statler.

한 문장 요약

World State를 갱신해가며 LLM Reasoning을 수행하자.

Contribution

기존 LLM 방식은 자신이 뱉어준 Action과 Observation에 기인해 reasoning을 수행해왔음.
- Traditional LLMs in robotics generate actions based only on prior actions and observations, lacking an explicit world state.
2개의 prompted LLM을 통해 world-state를 explicit하게 maintaining 하겠다는 의도.

Statler utilizes a pair of prompted LLMs: instructed by a few demonstrations, the world-state reader takes as input the user query, reads the estimated world state, and generates an executable action (e.g, a code snippet); instructed by another set of demonstrations, the world-state writer updates the world state estimate based on the action.
이를 통해 Long-Horizon LLM Interaction에서 기억을 소실하거나, misleading reasoning을 방지해준다고 함.

Motivation

Fig. 2: Motivation of Statler.

야바위 게임처럼, 컵 안의 물체는 계속해서 움직이게 되는데, 우리가 명시적으로 공의 움직임을 관찰하지는 못하지만, internal-representation에 의해 공이 어떠한 컵에 들어있는지 알 수 있다.
이것에서 영감을 얻어, LLM이 관측하지 못하는 world-state에 대해 maintaining하는 방향으로 연구를 수행함.

Methodology

Fig. 3: World State Reader.

Fig. 4: World State Writer.

World-State-Reader
- reader는 현재 state를 고려한 action을 취해주는 LLM 역할.
- 여기서 state가 update 되어야 하는 부분도 고려해서 답변을 만듦.
World-State-Writer
- 앞선 reader가 추론해낸 state에 기반해, state를 upadte하고, external memory에 저장함. (이것이 곧 current-state가 됨.)

Experiments

Fig. 5: Example scenario about Statler.

Thoughts:

저자가 언급한 야바위 문제로 motivation을 얘기하느데, 이 점이 꽤나 재밌게 느껴졌습니다.
지금 GPT-4V로 연구를 수행하며 느낀 점이, long-horizon interaction을 수행할 때에 기억을 소실한다는 것이었습니다.
저자는 GPT-4 모델로 state-reader, state-writer 두 개로 역할을 나누어 이러한 기억 소실을 방지해내고자 했습니다.
- appendix도 읽어보았으나, prompt에 대해 novel한 부분을 찾지는 못했습니다. world-state를 template에 맞게 뱉어주고, 이에 기반해 world-state를 업데이트 시켰다고 하는데, 단순히 이것 만으로 기억 소실을 개선시켰다는 점이 신기합니다.

한 문장 요약

Contribution

Motivation

Methodology

Experiments

Thoughts:

Enjoy Reading This Article?