From Red Wine to Red Tomato: Composition with Context

TLDR

Compositionality and contextuality are essential for intelligent concept composition, yet existing learning methods often ignore contextuality and require large labeled datasets, as illustrated by the differing meanings of “red” in wine versus tomato. The paper proposes a simple method that incorporates contextuality to compose classifiers of known visual concepts. The method models classifiers as points in a smooth space, applying compositional transforms that generalize to unseen concept combinations, and is analyzed in detail for its properties. Experiments composing attributes, objects, and subject–predicate–object triples show the method’s strong generalization over baselines, with detailed analysis highlighting its properties.

Abstract

Compositionality and contextuality are key building blocks of intelligence. They allow us to compose known concepts to generate new and complex ones. However, traditional learning methods do not model both these properties and require copious amounts of labeled data to learn new concepts. A large fraction of existing techniques, e.g., using late fusion, compose concepts but fail to model contextuality. For example, red in red wine is different from red in red tomatoes. In this paper, we present a simple method that respects contextuality in order to compose classifiers of known visual concepts. Our method builds upon the intuition that classifiers lie in a smooth space where compositional transforms can be modeled. We show how it can generalize to unseen combinations of concepts. Our results on composing attributes, objects as well as composing subject, predicate, and objects demonstrate its strong generalization performance compared to baselines. Finally, we present detailed analysis of our method and highlight its properties.

References

Page 1

	Year	Citations

Page 1