Concepedia

Publication | Open Access

A survey on semantic data management as intersection of ontology-based data access, semantic modeling and data lakes

23

Citations

48

References

2024

Year

Abstract

In recent years, data lakes emerged as a way to manage large amounts of heterogeneous data for modern data analytics. One way to prevent data lakes from turning into inoperable data swamps is semantic data management. Such approaches propose the linkage of metadata to knowledge graphs based on the Linked Data principles to provide more meaning and semantics to the data in the lake. Such a semantic layer may be utilized not only for data management but also to tackle the problem of data integration from heterogeneous sources, in order to make data access more expressive and interoperable. In this survey, we review recent approaches with a specific focus on the application within data lake systems and scalability to Big Data. We classify the approaches into (i) basic semantic data management, (ii) semantic modeling approaches for enriching metadata in data lakes, and (iii) methods for ontology-based data access. In each category, we cover the main techniques and their background, and compare latest research. Finally, we point out challenges for future work in this research area, which needs a closer integration of Big Data and Semantic Web technologies. • Definitions and Taxonomy related to Semantic Data Management. • Review and comparison of metadata models for data lakes that address semantics. • Semantic Modeling pipeline: Review of Algorithms for labeling and modeling. • Ontology-Based Data Access: Exploring scalable query processing using semantics • Challenges in Semantic Data Management

References

YearCitations

Page 1