Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting

TLDR

Long‑sequence time‑series forecasting demands models that capture long‑range dependencies efficiently, and while Transformers can increase prediction capacity, their quadratic complexity, high memory usage, and encoder‑decoder limitations hinder direct application. The authors propose Informer, an efficient transformer designed to overcome these limitations for long‑sequence forecasting. Informer achieves this through a ProbSparse self‑attention mechanism with O(L log L) complexity, a self‑attention distilling layer that halves cascading inputs to focus on dominant attention, and a generative‑style decoder that outputs the full sequence in a single forward operation. Experiments on four large‑scale datasets show that Informer significantly outperforms existing methods, establishing a new benchmark for long‑sequence forecasting.

Abstract

Many real-world applications require the prediction of long sequence time-series, such as electricity consumption planning. Long sequence time-series forecasting (LSTF) demands a high prediction capacity of the model, which is the ability to capture precise long-range dependency coupling between output and input efficiently. Recent studies have shown the potential of Transformer to increase the prediction capacity. However, there are several severe issues with Transformer that prevent it from being directly applicable to LSTF, including quadratic time complexity, high memory usage, and inherent limitation of the encoder-decoder architecture. To address these issues, we design an efficient transformer-based model for LSTF, named Informer, with three distinctive characteristics: (i) a ProbSparse self-attention mechanism, which achieves O(L log L) in time complexity and memory usage, and has comparable performance on sequences' dependency alignment. (ii) the self-attention distilling highlights dominating attention by halving cascading layer input, and efficiently handles extreme long input sequences. (iii) the generative style decoder, while conceptually simple, predicts the long time-series sequences at one forward operation rather than a step-by-step way, which drastically improves the inference speed of long-sequence predictions. Extensive experiments on four large-scale datasets demonstrate that Informer significantly outperforms existing methods and provides a new solution to the LSTF problem.

References

Page 1

	Year	Citations
MizAR 60 for Mizar 50 DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)	2023	73.5K
AI-Assisted Pipeline for Dynamic Generation of Trustworthy Health Supplement Content at Scale DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)	2018	45.3K
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau arXiv (Cornell University) Natural Language ProcessingComputer-assisted TranslationStructured PredictionSequence ModellingEngineering	2014	14.6K
Evaluating the Effectiveness of Large Language Models in Representing Textual Descriptions of Geometry and Spatial Relations (Short Paper) DROPS (Schloss Dagstuhl – Leibniz Center for Informatics)	2023	14.1K
Sequence to Sequence Learning with Neural Networks Ilya Sutskever, Oriol Vinyals, Quoc V. Le arXiv (Cornell University) Structured PredictionEngineeringMachine LearningSequential LearningSmt System	2014	13.3K
On the Properties of Neural Machine Translation: Encoder–Decoder Approaches Kyunghyun Cho, Bart van Merriënboer, Dzmitry Bahdanau, Natural Language ProcessingStructured PredictionSequence ModellingEngineeringMachine Learning	2014	6.5K
Transformer-XL: Attentive Language Models beyond a Fixed-Length Context Zihang Dai, Zhilin Yang, Yiming Yang,	2019	3.1K
Convergence Results for Neural Networks via Electrodynamics arXiv (Cornell University) Numerical AnalysisGeometric LearningGradient DescentEngineeringMachine Learning	2018	2.9K
Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) Djork-Arné Clevert, Thomas Unterthiner, Sepp Hochreiter arXiv (Cornell University) Artificial IntelligenceConvolutional Neural NetworkEngineeringMachine LearningAutoencoders	2015	2.3K
Longformer: The Long-Document Transformer Iz Beltagy, Matthew E. Peters, Arman Cohan arXiv (Cornell University) Llm Fine-tuningEngineeringMachine LearningLong-document TransformerMultilingual Pretraining	2020	2.2K

Page 1