AsyncQVI: Asynchronous-Parallel Q-Value Iteration for Reinforcement Learning with Near-Optimal Sample Complexity

Yibo Zeng, Fei Feng, and Wotao Yin

Under preparation

Overview

In this paper, we propose AsyncQVI: Asynchronous-Parallel Q-value Iteration to solve Reinforcement Learning (RL) problems. Given an RL problem with |S| states, |A| actions, and a discounted factor gamma in (0,1), AsyncQVI returns an varepsilon-optimal policy with probability at least 1-delta at the sample complexity

 tilde{mathcal{O}}left(frac{|S||A|}{(1-gamma)^5varepsilon^2}logleft(frac{1}{delta}right)right)

AsyncQVI is the first asynchronous-parallel RL algorithm with convergence rate analysis and an explicit sample complexity. The above sample complexity of AsyncQVI nearly matches the lower bound. Furthermore, AsyncQVI is scalable since it has low memory footprint at O(|S|) and also has an efficient asynchronous-parallel implementation.

Citation

Y. Zeng, F. Feng, and W. Yin, AsyncQVI: Asynchronous-parallel Q-Value iteration for reinforcement learning with near-optimal sample complexity, UCLA CAM Report 18-71, 2018.


« Back