Publication | Open Access
Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data
1.4K
Citations
65
References
2017
Year
EngineeringSemantic WebTheory-guided Data ScienceOntology-based Data IntegrationData ScienceData MiningNew ParadigmManagementScientific DiscoveryResearch ThemesData IntegrationInformation DiscoveryKnowledge Discovery ProcessData ManagementData-driven ScienceKnowledge DiscoveryResearch Data ManagementKnowledge Data EngineeringDiscovery TechniqueData Science ModelsData EngineeringData Modeling
Data science models have succeeded commercially but struggle with complex physical phenomena, prompting the emergence of theory‑guided data science (TGDS) which leverages scientific knowledge to enhance model effectiveness and has gained traction across disciplines such as turbulence, materials, quantum chemistry, biomedicine, climate science, and hydrology. TGDS seeks to embed scientific consistency into data‑driven models to produce generalizable, interpretable solutions that uncover novel domain insights, and this paper formalizes the paradigm, offers a taxonomy, and outlines promising research directions. The authors formalize TGDS, present a taxonomy of research themes, and illustrate how domain knowledge can be integrated across these themes with examples from various disciplines.
Data science models, although successful in a number of commercial domains, have had limited applicability in scientific problems involving complex physical phenomena. Theory-guided data science (TGDS) is an emerging paradigm that aims to leverage the wealth of scientific knowledge for improving the effectiveness of data science models in enabling scientific discovery. The overarching vision of TGDS is to introduce scientific consistency as an essential component for learning generalizable models. Further, by producing scientifically interpretable models, TGDS aims to advance our scientific understanding by discovering novel domain insights. Indeed, the paradigm of TGDS has started to gain prominence in a number of scientific disciplines such as turbulence modeling, material discovery, quantum chemistry, bio-medical science, bio-marker discovery, climate science, and hydrology. In this paper, we formally conceptualize the paradigm of TGDS and present a taxonomy of research themes in TGDS. We describe several approaches for integrating domain knowledge in different research themes using illustrative examples from different disciplines. We also highlight some of the promising avenues of novel research for realizing the full potential of theory-guided data science.
| Year | Citations | |
|---|---|---|
Page 1
Page 1