[paper-review] OFFLINE REINFORCEMENT LEARNING WITH IMPLICIT Q-LEARNING
NeurIPS 2023. [Paper] [Github1 Github2] Ilya Kostrikov, Ashvin Nair & Sergey Levine Department of Electrical Engineering and Computer Science University of California, Berkeley
12 Oct 2021
한 문장 요약
요약: State value function을 random variable로 정의하여, policy improvement를 implicit하게 근사해보자. 구체적으로는 Expectiles of the state value function을 추정해보자.
Offline RL method that never needs to evaluate actions outside of the dataset, but still enables the learned policy to improve substantially over the best behavior in the data through generalization
Keyword: Reinforcement Learning, Offline RL, Quantile Regression
Introduction
Enjoy Reading This Article?
Here are some more articles you might like to read next: