Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling
On Universally Optimal Algorithms for A/B Testing
Safe Collaborative Filtering
Policy Gradient with Kernel Quadrature
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Policy Gradient Algorithms with Monte-Carlo Tree Learning for Non-Markov Decision Processes
On the True Distribution Approximation of Minimum Bayes-Risk Decoding
Model-based minimum bayes risk decoding
Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding
Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding
Adaptively Perturbed Mirror Descent for Learning in Games
On Universally Optimal Algorithms for A/B Testing
Matroid Semi-Bandits in Sublinear Time
二人零和ゲームにおける突然変異駆動型正則化先導者追従法の終極反復収束
Scalable and Provably Fair Exposure Control for Large-Scale Recommender Systems
Learning Fair Division from Bandit Feedback
Optimal Clustering from Noisy Binary Feedback
Memory Asymmetry Creates Heteroclinic Orbits to Nash Equilibrium in Learning in Zero-Sum Games
On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget
Exploration of Unranked Items in Safe Online Learning to Re-Rank
Rate-Optimal Bayesian Simple Regret in Best Arm Identification