Publication | Closed Access
Compression of inverted indexes For fast query evaluation
237
Citations
15
References
2002
Year
Unknown Venue
Retrieval EfficiencyEngineeringInformation RetrievalData ScienceData MiningInverted IndexesIndexing TechniqueKnowledge DiscoveryComputer ScienceInverted ListsSearch Engine IndexingParallel ComputingCompression SchemeData CompressionData ManagementText MiningData Indexing
Compression reduces both the size of indexes and the time needed to evaluate queries. In this paper, we revisit the compression of inverted lists of document postings that store the position and frequency of indexed terms, considering two approaches to improving retrieval efficiency: better implementation and better choice of integer compression schemes. First, we propose several simple optimisations to well-known integer compression schemes, and show experimentally that these lead to significant reductions in time. Second, we explore the impact of choice of compression scheme on retrieval efficiency.In experiments on large collections of data, we show two surprising results: use of simple byte-aligned codes halves the query evaluation time compared to the most compact Golomb-Rice bitwise compression schemes; and, even when an index fits entirely in memory, byte-aligned codes result in faster query evaluation than does an uncompressed index, emphasising that the cost of transferring data from memory to the CPU cache is less for an appropriately compressed index than for an uncompressed index. Moreover, byte-aligned schemes have only a modest space overhead: the most compact schemes result in indexes that are around 10% of the size of the collection, while a byte-aligned scheme is around 13%. We conclude that fast byte-aligned codes should be used to store integers in inverted lists.
| Year | Citations | |
|---|---|---|
Page 1
Page 1