Temporal Fusion Transformers for interpretable multi-horizon time series forecasting

TLDR

Multi‑horizon forecasting relies on a complex mix of static covariates, known future inputs, and past‑observed exogenous series, yet existing deep‑learning models are typically black‑boxes that offer little insight into how these inputs interact. This work introduces the Temporal Fusion Transformer, an attention‑based architecture that delivers high‑performance multi‑horizon forecasts while providing interpretable insights into temporal dynamics. The model combines recurrent layers for local processing with interpretable self‑attention for long‑term dependencies, and employs feature‑selection components and gating layers to highlight relevant inputs and suppress irrelevant ones. Across diverse real‑world datasets, the Temporal Fusion Transformer outperforms current benchmarks and demonstrates three practical interpretability use cases.

Abstract

Multi-horizon forecasting often contains a complex mix of inputs – including static (i.e. time-invariant) covariates, known future inputs, and other exogenous time series that are only observed in the past – without any prior information on how they interact with the target. Several deep learning methods have been proposed, but they are typically 'black-box' models that do not shed light on how they use the full range of inputs present in practical scenarios. In this paper, we introduce the Temporal Fusion Transformer (TFT) – a novel attention-based architecture that combines high-performance multi-horizon forecasting with interpretable insights into temporal dynamics. To learn temporal relationships at different scales, TFT uses recurrent layers for local processing and interpretable self-attention layers for long-term dependencies. TFT utilizes specialized components to select relevant features and a series of gating layers to suppress unnecessary components, enabling high performance in a wide range of scenarios. On a variety of real-world datasets, we demonstrate significant performance improvements over existing benchmarks, and highlight three practical interpretability use cases of TFT.

References

Page 1

	Year	Citations

Page 1