User Tools

Site Tools

Writing /home/users/ashutosn/public_html/CommNetS2016/dokuwiki/data/cache/6/695c10671bd892cebfa6563de793eb75.i failed
Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.
Writing /home/users/ashutosn/public_html/CommNetS2016/dokuwiki/data/cache/6/695c10671bd892cebfa6563de793eb75.metadata failed
Writing /home/users/ashutosn/public_html/CommNetS2016/dokuwiki/data/cache/6/695c10671bd892cebfa6563de793eb75.i failed
Unable to save cache file. Hint: disk full; file permissions; safe_mode setting.
Writing /home/users/ashutosn/public_html/CommNetS2016/dokuwiki/data/cache/6/695c10671bd892cebfa6563de793eb75.xhtml failed

Decentralized Learning in Multi-player Multi-Armed Bandits

Abstract: Multi-Armed bandits are an elegant model of learning in an unknown and uncertain environment. Such models are relevant in many scenarios, and of late have received increased attention recently due to various problems of distributed control that have arisen in wireless networks, pricing models on the internet, etc. We consider a non-Bayesian multi-armed bandit setting proposed by Lai & Robbins in mid-80s. There are multiple arms each of which generates an i.i.d. reward from an unknown distribution. There are multiple players, each of whom is choosing which arm to play. If two or more players choose the same arm, they all get zero rewards. The problem is to design a learning algorithm to be used by the players that results in an orthogonal matching of players to arms (e.g., users to channels in wireless networks), and moreover minimizes the expected regret.

With multiple players, this can be considered as a bipartite matching problem, and modeled as a combinatorial version of classical multi-armed bandit problem but with dependent arms. An index-based learning algorithm ca be proposed for it that achieves logarithmic regret [Gai, Krishnamachari & Jain, 2011]. From prior results, it is known that this is order-optimal. We consider the distributed problem where players do not have a dedicated channel to communicate or coordinate. We propose an index-based algorithm that uses Bertsekas' auction mechanism to determine the bipartite matching. We show that the algorithm has expected regret at most near-log-squared. This is the first distributed multi-armed bandit learning algorithm in a general setting, as far as we know.

This is joint work with Rahul Jain and Naumaan Nayyar.


Dileep Kalathil received a B. Tech in ECE from the National Institute of Technology, Calicut, India in 2006 and an M.Tech from IIT Madras in 2008. He worked as a wireless research engineer in the Center of Excellence in Wireless Technology (CEWiT) in India for an year. Since 2009, he is a PhD student at USC where he has been an Annenberg Fellow from 2009- 11. His research interests include game theory, decentralized control, online optimization and applied probability.

decentralized_learning_for_multi-player_multi-armed_bandits.txt · Last modified: 2016/09/01 19:15 (external edit)