Publication | Closed Access
TOPSIG
29
Citations
12
References
2011
Year
Unknown Venue
Text RetrievalEngineeringInformation RetrievalData ScienceData MiningPattern RecognitionInverted FilesSimilarity SearchHash FunctionText IndexingComputer ScienceSearch Engine IndexingData ManagementPerceptual HashingText MiningFile Signatures
Comparisons between file signatures and inverted files for text retrieval have shown the shortcomings of traditional file signatures. It has been widely accepted that traditional file signatures are inferior alternatives to inverted files. This paper describes TopSig, a new approach to the construction of file signatures that extends recent advances in semantic hashing and dimensionality reduction. These were not so far linked to general purpose, signature file based, search engines. We demonstrate significant improvements in the performance of signature file based indexing and retrieval. Performance is comparable to the state of the art inverted file based systems, including language models and BM25. These findings suggest that file signatures offer a viable alternative to inverted files in suitable settings and positions the file signatures model in the class of Vector Space retrieval models.
| Year | Citations | |
|---|---|---|
Page 1
Page 1