Estimating and Penalizing Preference Shift in Recommender Systems

Abstract

Recommender systems trained via long-horizon optimization (e.g., reinforcement learning) will have incentives to actively manipulate user preferences through the recommended content. While some work has argued for making systems myopic to avoid this issue, even such systems can induce systematic undesirable preference shifts. Thus, rather than artificially stifling the capabilities of the system, in this work we explore how we can make capable systems that explicitly avoid undesirable shifts. We advocate for (1) estimating the preference shifts that would be induced by recommender system policies, and (2) explicitly characterizing what unwanted shifts are and assessing before deployment whether such policies will produce them – ideally even actively optimizing to avoid them. These steps involve two challenging ingredients: (1) requires the ability to anticipate how hypothetical policies would influence user preferences if deployed; instead, (2) requires metrics to assess whether such influences are manipulative or otherwise unwanted. We study how to do (1) from historical user interaction data by building a user predictive model that implicitly contains their preference dynamics; to address (2), we introduce the notion of a “safe policy”, which defines a trust region within which behavior is believed to be safe. We show that recommender systems that optimize for staying in the trust region avoid manipulative behaviors (e.g., changing preferences in ways that make users more predictable), while still generating engagement.

References

Page 1

	Year	Citations

Page 1