Adversarial Modality Alignment Network for Cross-Modal Molecule Retrieval

Abstract

The cross-modal molecule retrieval (Text2Mol) task aims to bridge the semantic gap between molecules and natural language descriptions. A solution to this nontrivial problem relies on a graph convolutional network (GCN) and cross-modal attention with contrastive learning for reasonable results. However, there exist the following issues. First, the cross-modal attention mechanism is only in favor of text representations and cannot provide helpful information for molecule representations. Second, the GCN-based molecule encoder ignores edge features and the importance of various substructures of a molecule. Finally, the retrieval learning loss function is rather simplistic. This article further investigates the Text2Mol problem and proposes a novel adversarial modality alignment network (AMAN) based method to sufficiently learn both description and molecule information. Our method utilizes a SciBERT as a text encoder and a graph transformer network as a molecule encoder to generate multimodal representations. Then, an adversarial network is used to align these modalities interactively. Meanwhile, a triplet loss function is leveraged to perform retrieval learning and further enhance the modality alignment. Experiments on the ChEBI-20 dataset show the effectiveness of our AMAN method compared with baselines.

References

Page 1

	Year	Citations

Page 1