Publication | Closed Access
Towards Reliable Data Collection and Annotation to Extract Pulmonary Digital Biomarkers Using Mobile Sensors
24
Citations
29
References
2019
Year
Unknown Venue
EngineeringMachine LearningMachine Learning AlgorithmsBiometricsWearable TechnologyMobile Sensor DataPulmonary BiomarkersHuman MonitoringSpeech RecognitionBiomarker (Medicine)Data ScienceDigital HealthPatient MonitoringBiostatisticsBiomarker DiscoveryHealth SciencesHealth InformaticsDigital BiomarkersMobile SensingSpeech ProcessingHealth MonitoringWearable Sensor
Smartphone and smartwatch sensors enable rich datasets for machine learning to extract digital biomarkers, but reliable algorithms require high‑fidelity sensor data and trustworthy ground truth. The study aims to collect mobile sensor data from 131 pulmonary patients to extract biomarkers and to identify practical challenges and improvements for data collection and annotation. Data were gathered via smartphone and smartwatch sensors (audio, accelerometer, gyroscope) from 131 subjects, and pulmonary sound events were annotated using crowdsourcing to develop signal‑processing and machine‑learning algorithms. The study found that chestband breathing ground truth is unreliable, that short audio segments (2.15 min) can yield breathlessness biomarkers comparable to longer speech data, that waveform visualization improves annotation quality leading to 6.59 % and 6 % accuracy gains in cough and spirometry classification, and that these insights guide future explainable ML for pulmonary digital biomarkers.
Proliferation of sensors embedded in smartphones and smartwatches helps capture rich dataset for machine learning algorithms to extract meaningful digital bio-markers on consumer devices for monitoring disease progression and treatment response. However, development and validation of machine learning algorithms depend on gathering high fidelity sensor data and reliable ground-truth. We conduct a study, called mLungStudy, with 131 subjects with varying pulmonary conditions to collect mobile sensor data including audio, accelerometer, gyroscope using a smartphone and a smartwatch, in order to extract pulmonary biomarkers such as breathing, coughs, spirometry, and breathlessness. Our study shows that commonly used breathing ground-truth data from chestband may not always be reliable as a gold-standard. Our analysis shows that breathlessness biomarkers such as pause time and pause frequency from 2.15 minutes of audio can be as reliable as those extracted from 5 minutes' worth of speech data. This finding can be useful for future studies to trade-off between the reliability of breathlessness data and patient comfort in generating continuous speech data. Furthermore, we use crowdsourcing techniques to annotate pulmonary sound events for developing signal processing and machine learning algorithms. In this paper, we highlight several practical challenges to collect and annotate physiological data and acoustic symptoms from chronic pulmonary patients and ways to improve data quality. We show that the waveform visualization of the audio signal improves annotation quality which leads to a 6.59% increase in cough classification accuracy and a 6% increase in spirometry event classification accuracy. Findings from this study inform future studies focusing on developing explainable machine learning models to extract pulmonary digital bio-markers using mobile sensors.
| Year | Citations | |
|---|---|---|
Page 1
Page 1