Concepedia

TLDR

Typical BI infrastructures rely on nightly ETL processes that extract, transform, cleanse, and load data, but fresher data is needed for near‑real‑time insights. The authors propose to import raw, unprocessed records directly into the warehouse and defer transformation and cleaning until reports request the data. On‑demand processing is achieved by using the database’s own mechanisms and event‑processing capabilities, and a prototype roadmap employing conventional database technology with hierarchical materialized views is outlined.

Abstract

In a typical BI infrastructure, data, extracted from operational data sources, is transformed, cleansed, and loaded into a data warehouse by a periodic ETL process, typically executed on a nightly basis, i.e., a full day’s worth of data is processed and loaded during off-hours. However, it is desirable to have fresher data for business insights at near real-time. To this end, the authors propose to leverage a data warehouse’s capability to directly import raw, unprocessed records and defer the transformation and data cleaning until needed by pending reports. At that time, the database’s own processing mechanisms can be deployed to process the data on-demand. Event-processing capabilities are seamlessly woven into our proposed architecture. Besides outlining an overall architecture, the authors also developed a roadmap for implementing a complete prototype using conventional database technology in the form of hierarchical materialized views.

References

YearCitations

2002

2.5K

2006

831

2011

736

2001

472

1989

441

1998

259

1997

228

2000

225

2005

205

2003

202

Page 1