Self-Supervised Contrastive Pre-Training For Time Series via Time-Frequency Consistency

TLDR

Pre‑training time‑series models is difficult because domain shifts in temporal dynamics, trends, and cyclic effects can degrade downstream performance, and unlike other modalities, time‑ and frequency‑based representations of the same signal are expected to be close in time‑frequency space. The authors aim to develop a pre‑training method that adapts to target domains with differing temporal dynamics without requiring target‑domain examples, by enforcing time‑frequency consistency. They propose a decomposable self‑supervised model that trains time and frequency components separately via contrastive estimation of their distance, and evaluate it on eight diverse datasets. Across eight state‑of‑the‑art baselines, the TF‑C method improves average F1 by 15.4 % in one‑to‑one fine‑tuning and precision by 8.4 % in one‑to‑many scenarios. Code and datasets are available at https://github.com/mims-harvard/TFC-pretraining.

Abstract

Pre-training on time series poses a unique challenge due to the potential mismatch between pre-training and target domains, such as shifts in temporal dynamics, fast-evolving trends, and long-range and short-cyclic effects, which can lead to poor downstream performance. While domain adaptation methods can mitigate these shifts, most methods need examples directly from the target domain, making them suboptimal for pre-training. To address this challenge, methods need to accommodate target domains with different temporal dynamics and be capable of doing so without seeing any target examples during pre-training. Relative to other modalities, in time series, we expect that time-based and frequency-based representations of the same example are located close together in the time-frequency space. To this end, we posit that time-frequency consistency (TF-C) -- embedding a time-based neighborhood of an example close to its frequency-based neighborhood -- is desirable for pre-training. Motivated by TF-C, we define a decomposable pre-training model, where the self-supervised signal is provided by the distance between time and frequency components, each individually trained by contrastive estimation. We evaluate the new method on eight datasets, including electrodiagnostic testing, human activity recognition, mechanical fault detection, and physical status monitoring. Experiments against eight state-of-the-art methods show that TF-C outperforms baselines by 15.4% (F1 score) on average in one-to-one settings (e.g., fine-tuning an EEG-pretrained model on EMG data) and by 8.4% (precision) in challenging one-to-many settings (e.g., fine-tuning an EEG-pretrained model for either hand-gesture recognition or mechanical fault prediction), reflecting the breadth of scenarios that arise in real-world applications. Code and datasets: https://github.com/mims-harvard/TFC-pretraining.