Publication | Closed Access
Sheffield University and the TREC 2004 Genomics Track: Query Expansion Using Synonymous Terms.
30
Citations
3
References
2004
Year
EngineeringSemantic SearchGeneticsGenomicsSemantic WebSemanticsBioinformatics DatabaseText MiningNatural Language ProcessingSheffield UniversityInformation RetrievalData ScienceTrec 2004Computational GenomicsBiomedical Text MiningGenomics TrackBiomedical OntologyBiological DatabaseKnowledge DiscoveryTerminology ExtractionMedical Language ProcessingBioinformaticsUmls MetathesaurusBiologyGene Sequence AnnotationMedicineSemantic Similarity
In this paper we describe our approach to the Ad Hoc Retrieval task of the TREC 2004 Genomics Track. This is a conventional searching task based on a 10-year subset of MEDLINE (about 4.5 million documents and 9 gigabytes in size) and 50 topics derived from information needs obtained via interviews of real biomedical researchers. We will also discuss the results of our submitted runs. The hypothesis we want to test is whether the performance on this particular retrieval task can be improved by expanding queries with synonyms of the original query terms. We use the UMLS Metathesaurus, a comprehensive collection of controlled vocabularies in the biomedical domain, to identify query terms in topics and to determine their synonyms. Our approach is simple in the sense that we only consider synonyms of query terms and do not exploit hierarchical relations between terms such as hyponomy and hyperonymy. Synonymy-based query expansion generally increases recall, but decreases precision due to ambiguous terms. Word senses of ambiguous terms which are inappropriate with regard to the topic under consideration give rise to “polluting” synonyms. We hope that the use of a specifically biomedical term resource such as UMLS will limit the negative effects synonymy-based query expansion may have on precision.
| Year | Citations | |
|---|---|---|
Page 1
Page 1