Publication | Open Access
PTB-XL, a large publicly available electrocardiography dataset
1K
Citations
16
References
2020
Year
EngineeringMachine LearningMachine Learning ToolDiagnosisDisease ClassificationElectrophysiological EvaluationData ScienceData MiningPattern RecognitionMulti-label DatasetPatient MonitoringBiostatisticsCardiologyCardiovascular ImagingEcg-waveform DataDeep LearningElectrophysiologyAvailable Electrocardiography DatasetMedicineHealth InformaticsEmergency Medicine
Electrocardiography is a key non‑invasive diagnostic tool for cardiovascular disease, yet the development of automated interpretation algorithms is hindered by a lack of public datasets and standardized benchmarking procedures. This study introduces PTB‑XL, the largest freely accessible 12‑lead ECG dataset, to address these gaps. PTB‑XL comprises 21,837 ten‑second 12‑lead recordings from 18,885 patients, annotated by up to two cardiologists into multi‑label diagnostic classes with super‑ and subclass hierarchies, and includes demographic data, diagnostic likelihoods, signal‑property annotations, and suggested training/testing splits. The rich annotations, metadata, and predefined splits make PTB‑XL a valuable resource for developing and evaluating automatic ECG interpretation algorithms.
Electrocardiography (ECG) is a key non-invasive diagnostic tool for cardiovascular diseases which is increasingly supported by algorithms based on machine learning. Major obstacles for the development of automatic ECG interpretation algorithms are both the lack of public datasets and well-defined benchmarking procedures to allow comparison s of different algorithms. To address these issues, we put forward PTB-XL, the to-date largest freely accessible clinical 12-lead ECG-waveform dataset comprising 21837 records from 18885 patients of 10 seconds length. The ECG-waveform data was annotated by up to two cardiologists as a multi-label dataset, where diagnostic labels were further aggregated into super and subclasses. The dataset covers a broad range of diagnostic classes including, in particular, a large fraction of healthy records. The combination with additional metadata on demographics, additional diagnostic statements, diagnosis likelihoods, manually annotated signal properties as well as suggested folds for splitting training and test sets turns the dataset into a rich resource for the development and the evaluation of automatic ECG interpretation algorithms.
| Year | Citations | |
|---|---|---|
Page 1
Page 1