Publication | Closed Access
A Mapping Study about Data Lakes: An Improved Definition and Possible Architectures
30
Citations
31
References
2019
Year
EngineeringBusiness IntelligenceData MappingSemantic WebData EcosystemMappingGeospatial MappingData ScienceDatabase SupportManagementSpatial Data ManagementData IntegrationData ManagementCartographyData ModelingTerm Data LakesGeographyMapping StudyData ArchitectureData LakesCloud ComputingImproved DefinitionBig DataData Lake
In the past few years, data lakes emerged as a trending topic in big data technologies.Although literature presents different points of view related to its functionalities, it serves mainly to store a variety of data in a big data context.In this paper, we aim to identify and analyze data lake definitions and possible architectures.Our methodology was composed of a systematic literature mapping based on PRISMA, software engineering best practices to perform reviews, and Kappa method to assess results' quality.We performed the search in eight different electronic databases to achieve a wide variety of publishers in Computer Science.We first identified 662 papers matching our search criteria; after filtering, we selected 87 papers for review.We found that the term data lakes was first defined by James Dixon in 2010.We also found that the term is often related to raw data repositories.From the identified definitions, we propose a new one as a means to better state what data lakes refer to and improve how the community use them.Moreover, we foind that Hadoop and its ecosystem compose the most used toolset to create data lakes, revealing that this is the mainstream in architectures for data lakes as of today's available technologies.
| Year | Citations | |
|---|---|---|
Page 1
Page 1