Publication | Closed Access
Indexing emails and email threads for retrieval
18
Citations
0
References
2005
Year
Unknown Venue
EngineeringCollaborative Information RetrievalIntelligent Information RetrievalElectronic MailCommunicationCorpus LinguisticsText MiningNatural Language ProcessingInformation RetrievalData ScienceData MiningComputational LinguisticsLanguage StudiesContent AnalysisData ManagementEnterprise Search TrackKnowledge RetrievalKnowledge DiscoveryText IndexingInformation ManagementData IndexingTrec 2005Search Engine IndexingTest CollectionLinguisticsInteractive Information Retrieval
Electronic mail poses a number of unusual challenges for the design of information retrieval systems and test collections, including informal expression, conversational structure, variable document granularity (e.g., messages, threads, or longer-term interactions), a naturally occuring integration between free text and structural metadata, and incompletely characterized user needs. This paper reports on initial experiments with a large collection of public mailing lists from the World Wide Web consortium that will be used for the TREC 2005 Enterprise Search Track. Automatic subject-line threading and removal of duplicated text were found to have little effect in a small pilot study. Those observations motivated development of a question typology and more detailed analysis of collection characteristics; preliminary results for both are reported.