Memory Fusion Network for Multi-view Sequential Learning

TLDR

Multi‑view sequential learning involves sequences with multiple views that exhibit both view‑specific and cross‑view interactions. This work introduces the Memory Fusion Network (MFN), a neural architecture that explicitly models both interaction types over time. MFN comprises a System of LSTMs for isolated view‑specific learning, a Delta‑memory Attention Network to capture cross‑view interactions, and a Multi‑view Gated Memory that aggregates these interactions across time, and it is evaluated against several benchmark datasets. Across three public multi‑view datasets, MFN surpasses all existing multi‑view methods and sets new state‑of‑the‑art performance.

Abstract

Multi-view sequential learning is a fundamental problem in machine learning dealing with multi-view sequences. In a multi-view sequence, there exists two forms of interactions between different views: view-specific interactions and cross-view interactions. In this paper, we present a new neural architecture for multi-view sequential learning called the Memory Fusion Network (MFN) that explicitly accounts for both interactions in a neural architecture and continuously models them through time. The first component of the MFN is called the System of LSTMs, where view-specific interactions are learned in isolation through assigning an LSTM function to each view. The cross-view interactions are then identified using a special attention mechanism called the Delta-memory Attention Network (DMAN) and summarized through time with a Multi-view Gated Memory. Through extensive experimentation, MFN is compared to various proposed approaches for multi-view sequential learning on multiple publicly available benchmark datasets. MFN outperforms all the multi-view approaches. Furthermore, MFN outperforms all current state-of-the-art models, setting new state-of-the-art results for all three multi-view datasets.

References

Page 1

	Year	Citations

Page 1