Psuedo-Data Injections for CLO Bandit Problems
2026
Contextual linear optimization (CLO) with bandit feedback is a class of CLO problems where only the costs of historical actions are observable. Finding an optimal decision making policy in this setting suffers from the fundamental challenge that real-world data often lacks coverage over the action space, making the full cost vector unidentifiable with the data available. A common remedy is to apply regularization to ensures stability of the learning problem. We show that this approach admits an alternative interpretation as a specific form of pseudo-data injection where synthetic data is added to induce coverage. This perspective suggests a broader question...
