Publication | Open Access
Self-indexing inverted files for fast text retrieval
361
Citations
34
References
1996
Year
EngineeringInverted ListsCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningManagementData IntegrationData ManagementFast Text RetrievalKnowledge DiscoveryText IndexingComputer ScienceQuery OptimizationData IndexingQuery-processing CostsSearch Engine IndexingIndexing TechniqueQuery Term
Query-processing costs on large text databases are dominated by the need to retrieve and scan the inverted list of each query term. Retrieval time for inverted lists can be greatly reduced by the use of compression, but this adds to the CPU time required. Here we show that the CPU component of query response time for conjunctive Boolean queries and for informal ranked queries can be similarly reduced, at little cost in terms of storage, by the inclusion of an internal index in each compressed inverted list. This method has been applied in a retrieval system for a collection of nearly two million short documents. Our experimental results show that the self-indexing strategy adds less than 20% to the size of the compressed inverted file, which itself occupies less than 10% of the indexed text, yet can reduce processing time for Boolean queries of 5-10 terms to under one fifth of the previous cost. Similarly, ranked queries of 40-50 terms can be evaluated in as little as 25% of the previous time, with little or no loss of retrieval effectiveness.
| Year | Citations | |
|---|---|---|
Page 1
Page 1