Publication | Closed Access
Low-Rank and Locality Constrained Self-Attention for Sequence Modeling
37
Citations
34
References
2019
Year
Structured PredictionEngineeringMachine LearningLarge Language ModelLanguage ProcessingNatural Language ProcessingLocality Constrained Self-attentionData ScienceComputational LinguisticsSelf-supervised LearningAttention MechanismLanguage StudiesMachine TranslationLarge Ai ModelSequence ModellingNlp TaskComputer ScienceDeep LearningSelf-attention MechanismLinguistics
Self-attention mechanism becomes more and more popular in natural language processing (NLP) applications. Recent studies show the Transformer architecture which relies mainly on the attention mechanism achieves much success on large datasets. But a raised problem is its generalization ability is weaker than CNN and RNN on many moderate-sized datasets. We think the reason can be attributed to its unsuitable inductive bias of the self-attention structure. In this paper, we regard the self-attention as matrix decomposition problem and propose an improved self-attention module by introducing two linguistic constraints: low-rank and locality. We further develop the low-rank attention and band attention to parameterize the self-attention mechanism under the low-rank and locality constraints. Experiments on several real NLP tasks show our model outperforms the vanilla Transformer and other self-attention models on moderate size datasets. Additionally, evaluation on a synthetic task gives us a more detailed understanding of working mechanisms of different architectures.
| Year | Citations | |
|---|---|---|
Page 1
Page 1