Learning De-biased Representations with Biased Representations

TLDR

Machine learning models are typically trained on data from a single source, which limits their ability to detect and avoid dataset biases that can serve as shortcuts, leading to poor generalization when biases shift. The authors propose a framework that trains a de‑biased representation by encouraging it to diverge from a set of deliberately biased representations. The method trains the target representation to be dissimilar to a predefined set of biased representations, a strategy that is applicable whenever biased representations can be defined more easily than the bias itself. Experiments on synthetic and real‑world biases show that the approach reduces reliance on bias shortcuts and improves generalization, and the source code is publicly available.

Abstract

Many machine learning algorithms are trained and evaluated by splitting data from a single source into training and test sets. While such focus on in-distribution learning scenarios has led to interesting advancement, it has not been able to tell if models are relying on dataset biases as shortcuts for successful prediction (e.g., using snow cues for recognising snowmobiles), resulting in biased models that fail to generalise when the bias shifts to a different class. The cross-bias generalisation problem has been addressed by de-biasing training data through augmentation or re-sampling, which are often prohibitive due to the data collection cost (e.g., collecting images of a snowmobile on a desert) and the difficulty of quantifying or expressing biases in the first place. In this work, we propose a novel framework to train a de-biased representation by encouraging it to be different from a set of representations that are biased by design. This tactic is feasible in many scenarios where it is much easier to define a set of biased representations than to define and quantify bias. We demonstrate the efficacy of our method across a variety of synthetic and real-world biases; our experiments show that the method discourages models from taking bias shortcuts, resulting in improved generalisation. Source code is available at this https URL.