[paper-review] OFFLINE REINFORCEMENT LEARNING WITH IMPLICIT Q-LEARNING

NeurIPS 2023. [Paper] [Github1 Github2]

Ilya Kostrikov, Ashvin Nair & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley

12 Oct 2021

한 문장 요약

요약: State value function을 random variable로 정의하여, policy improvement를 implicit하게 근사해보자. 구체적으로는 Expectiles of the state value function을 추정해보자.

Offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization

Keyword: Reinforcement Learning, Offline RL, Quantile Regression

Introduction




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • [paper-review] Contrastive Prefence Learning: Learning from Human Feedback without RL
  • [paper-review] AlignDiff: Aligning Diverse Human Preferences via Behavior-customisable Diffusion Model
  • [paper-review] User preference optimization for control of ankle exoskeletons using sample efficient active learning
  • [seminar] Making Robots See and Manipulate
  • [paper-review] Relative Behavioral Attributes: Filling the Gap between Symbolic Goal Specification and Reward Learning from Human Preferences