Cataloguing and Linking Life Sciences LOD Cloud

Abstract

The Life Sciences Linked Open Data (LSLOD) Cloud is currently comprised of multiple datasets that add high value to biomedical research. However, navigating these multiple datasets is not easy as most of them are fragmented across multiple SPARQL endpoints, each containing trillions of triples and represented with insufficient vocabulary reuse. To retrieve and match from multiple endpoints, the data require to answer meaningful biological questions, it is first necessary to catalogue the data represented in each endpoint. We explore the schema used to represent data from a total of 52 meaningful Life Si SPARQL di t d t th d l f li ki ltd Method We catalogued the LSLOD by harvesting, from 52 SPARQL endpoints the set of distinct concept/properties that may be used to query the data and the resulting triples were organized in an RDF document, the LSLOD Catalogue. The LSLOD catalogue resulted in a “pool” of 12,396 concepts and 1,255 distinct properties from 52 endpoints. We combined several approaches for creating links between concepts and properties and resulted into 3 t f th i Sciences SPARQL endpoints and present our methodology for linking related concepts and properties from the “pool” of available elements. We found the outcome of this exploratory work not only to be helpful in identifying redundancy and gaps in the data, but also for enabling the assembly of complex federated queries. We present three different approaches used to weave concepts.

References

Page 1

	Year	Citations

Page 1