User Tools

Site Tools

Linearly Parameterized Bandits

Paat Rusmevichientong (USC)

Wed April 4, 2:00pm. EEB 248


Motivated by applications in e-commerce and revenue management, we consider multiarmed bandit problems involving a large (possibly infinite) collection of arms, in which the expected reward of each arm is a linear function of an unknown univariate or multivariate random variable. The objective is to choose a sequence of arms to minimize the cumulative regret and Bayes risk. For the univariate case, we find that a greedy policy suffices. For the multivariate case, when the set of arms is strongly convex, we provide a lower bound and a policy that results in a matching upper bound. In the absence of strong convexity, we describe a policy whose performance is within a polylogarithmic factor from the optimal.

This is a joint work with John Tsitsiklis and Adam Mersereau.

Back to CommNetS Seminar Page

linearly_parameterized_bandits.txt · Last modified: 2016/09/01 19:15 (external edit)