Publication | Closed Access
Retrieval experiments using pseudo-desktop collections
62
Citations
23
References
2009
Year
Unknown Venue
EngineeringIntelligent Information RetrievalQuery ModelSemantic WebReal Desktop CollectionCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningData IntegrationData RetrievalRetrieval ExperimentsQuery ExpansionKnowledge DiscoveryDesktop SearchComputer ScienceRetrieval MethodsTest CollectionInteractive Information Retrieval
Desktop search is an important part of personal information management (PIM). However, research in this area has been limited by the lack of shareable test collections, making cumulative progress difficult. In this paper, we define desktop search as a semi-structured document retrieval problem and introduce a methodology to automatically build a reusable collection (the pseudo-desktop) that has many of the same properties as a real desktop collection. We then present a comprehensive evaluation of retrieval methods for semi-structured document retrieval on several pseudo-desktop collections and the TREC Enterprise collection. Our results show that a probabilistic retrieval model using the mapping relation between a query term and a document field (PRM-S) has the best performance in collections with more structure, such as email, and that the query-likelihood language model is better for other document types. We further analyze the observed differences using generated queries and suggest ways to improve PRM-S, which makes the performance gains more significant and consistent.
| Year | Citations | |
|---|---|---|
Page 1
Page 1