Publication | Closed Access
DataWarrior: An Open-Source Program For Chemistry Aware Data Visualization And Analysis
1.7K
Citations
13
References
2015
Year
EngineeringHit IdentificationInteractive Data ExplorationData VisualizationData ExplorationChemistryMolecular GraphicInteractive VisualizationData ScienceData MiningData IntegrationBiostatisticsBiological Network VisualizationActivity CliffsBiological DatabaseKnowledge DiscoveryOpen-source ProgramOmicsBioinformaticsComputational BiologyRational Drug DesignMedicineData PointsDrug DiscoveryDrug Discovery Projects
Drug discovery projects generate thousands of chemical structures and tens of thousands of assay data points, yet interpreting these requires understanding molecular families, structural motifs, and subtle changes, a task for which specialized chemical‑intelligent visualization tools are scarce. To address this need, the authors released DataWarrior, a free, chemistry‑aware data analysis program, and present an overview of its functionality and architecture. DataWarrior implements an unsupervised 2‑dimensional scaling algorithm that uses vector‑based or non‑vector descriptors to visualize chemical or pharmacophore space, enabling interactive exploration of chemical space, activity landscapes, and activity cliffs.
Drug discovery projects in the pharmaceutical industry accumulate thousands of chemical structures and ten-thousands of data points from a dozen or more biological and pharmacological assays. A sufficient interpretation of the data requires understanding, which molecular families are present, which structural motifs correlate with measured properties, and which tiny structural changes cause large property changes. Data visualization and analysis software with sufficient chemical intelligence to support chemists in this task is rare. In an attempt to contribute to filling the gap, we released our in-house developed chemistry aware data analysis program DataWarrior for free public use. This paper gives an overview of DataWarrior's functionality and architecture. Exemplarily, a new unsupervised, 2-dimensional scaling algorithm is presented, which employs vector-based or nonvector-based descriptors to visualize the chemical or pharmacophore space of even large data sets. DataWarrior uses this method to interactively explore chemical space, activity landscapes, and activity cliffs.
| Year | Citations | |
|---|---|---|
Page 1
Page 1