Here are my Data Files. Here are my Queries. Where are my Results

Abstract

Database management systems (DBMS) provide incredible flexibility and performance\nwhen it comes to query processing, scalability and accuracy.\nTo fully exploit DBMS features, however, the user must \\emph{define} a schema,\n\\emph{load} the data, \\emph{tune} the system for the expected workload, and answer several questions.\nShould the database use a column-store, a row-store or some hybrid format?\nWhat indices should be created?\nAll these questions make for a formidable and time-consuming hurdle, often\ndeterring new applications or imposing high cost to existing ones.\nA characteristic example is that of scientific databases with huge data sets.\nThe prohibitive initialization cost and complexity\nstill forces scientists to rely on ``ancient" tools for their data management tasks,\ndelaying scientific understanding and progress.\n\nUsers and applications collect their data in flat files, which have traditionally been\nconsidered to be ``outside" a DBMS. A DBMS wants control:\nalways bring all data ``inside", replicate it and format it in its own ``secret" way.\nThe problem has been recognized and current efforts extend existing systems with\nabilities such as reading information from flat files\nand gracefully incorporating it into the processing engine.\nThis paper proposes a new generation of systems where\nthe only requirement from the user is \\emph{a link to the raw data files}.\nQueries can then immediately be fired without preparation steps in between.\nInternally and in an abstract way, the system takes care of\nselectively, adaptively and incrementally providing the proper environment\ngiven the queries at hand. Only part of the data is loaded at any given time\nand it is being stored and accessed in the format suitable for the current workload.

References

Page 1

	Year	Citations

Page 1