Publication | Closed Access
Towards generating ETL processes for incremental loading
55
Citations
13
References
2008
Year
Unknown Venue
EngineeringData WarehouseComputer ArchitectureChange Data CaptureSoftware EngineeringParallel ToolData ScienceDatabase SupportEtl ProcessesManagementSystems EngineeringData IntegrationParallel ComputingBig DataData WarehousingData ManagementComputer EngineeringPerformance Analysis ToolRuntime SystemChange PropagationData EngineeringAbstract Schema MappingsParallel ProgrammingIndustrial InformaticsData Transformation (Computing)System SoftwareData Modeling
Extract, Transform, and Load (ETL) processes physically integrate data from multiple, heterogeneous sources in a central repository referred to as data warehouse. Physically integrated data gets stale when source data is changed, hence periodic refreshes are required. For efficiency reasons data warehouses are typically refreshed incrementally, i.e. changes are captured at the sources and propagated to the data warehouse on a regular basis. Dedicated ETL processes referred to as incremental load processes are employed to extract changes from the sources, propagate the changes, and refresh the data warehouse incrementally. Changes required in the data warehouse are inferred from changes captured at the sources during change propagation. The creation of incremental load processes is a complex task reserved to trained ETL programmers. In this paper we review existing Change Data Capture (CDC) techniques and discuss limitations of different approaches. We further review existing techniques for refreshing data warehouses. We then present an approach for generating incremental load processes from abstract schema mappings.
| Year | Citations | |
|---|---|---|
Page 1
Page 1