Concepedia

Publication | Closed Access

SoMEF: A Framework for Capturing Scientific Software Metadata from its Documentation

10

Citations

11

References

2019

Year

Abstract

Scientific software has become a key asset to reproduce and understand the products of scientific research in many disciplines. However, scientific software is becoming increasingly complex and, as a result, researchers need to spend a significant amount of time finding, reading and understanding software documentation to set it up. In this paper we describe SoMEF, a Software Metadata Extraction Framework designed to help highlighting the most important parts of scientific software documentation. SoMEF processes the README files in GitHub repositories to automatically extract which parts of their text refer to the description, installation, invocation, or citation of a software component. Despite its simple features, SoMEF successfully categorizes README excerpts with a minimum 0.92 precision and 0.90 ROC AUC. These results, tested on a corpus of over 70 scientific software repositories, are a promising start towards automatically generating knowledge graphs of scientific software metadata.

References

YearCitations

Page 1