
Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling
Efficient Creative Selection in Online Advertising using Top-Two Thompson Sampling
On Universally Optimal Algorithms for A/B Testing
機械学習が紡ぐゲーム理論のフロンティア
RLHFにおける分布シフトの評価
Policy Gradient with Kernel Quadrature
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Policy Gradient Algorithms with Monte-Carlo Tree Learning for Non-Markov Decision Processes
On the True Distribution Approximation of Minimum Bayes-Risk Decoding
Model-based minimum bayes risk decoding
Generating Diverse and High-Quality Texts by Minimum Bayes Risk Decoding
Hyperparameter-Free Approach for Faster Minimum Bayes Risk Decoding
Adaptively Perturbed Mirror Descent for Learning in Games
On Universally Optimal Algorithms for A/B Testing
二人零和ゲームにおける突然変異駆動型正則化先導者追従法の終極反復収束
研修医配属における地域間格差を調整する制約のモンテカルロ木探索
二人零和マルコフゲームにおける状態抽象化法に関する研究
Learning Fair Division from Bandit Feedback
Optimal Clustering from Noisy Binary Feedback
Memory Asymmetry Creates Heteroclinic Orbits to Nash Equilibrium in Learning in Zero-Sum Games
On Uniformly Optimal Algorithms for Best Arm Identification in Two-Armed Bandits with Fixed Budget
オンライン環境において公平な資源配分を実現するアルゴリズムに関する研究
Contact