Promoting Diversity in Top Hits for Biomedical Passage Retrieval.

Bibliographic Details
Title: Promoting Diversity in Top Hits for Biomedical Passage Retrieval.
Authors: Andreopoulos, Bill, Huang, Xiangji, An, Aijun, Labudde, Dirk, Hu, Qinmin
Source: Advances in Data Management; 2009, p371-393, 23p
Abstract: With the volume of biomedical literature exploding, such as BMC or PubMed, it is of paramount importance to have scalable passage retrieval systems that allow researchers to quickly find desired information. While topical relevance is the most important factor in biomedical text retrieval, an effective retrieval system needs to also cover diverse aspects of the topic. Aspect-level performance means that top-ranked passages for a topic should cover diverse aspects. Aspect-level retrieval methods often involve clustering the retrieved passages on the basis of textual similarity. We propose the HIERDENC text retrieval system that ranks the retrieved passages, achieving scalability and improved aspect-level performance over other clustering methods. HIERDENC runtimes scale on large datasets, such as PubMed and BMC. The HIERDENC aspect-level performance is consistently better than cosine similarity and Hamming Distance-based clustering methods. HIERDENC is comparable to biclustering separation of relevant passages, and improves on topics where many aspects are involved. Converting textual passages to GO/MeSH ontological terms improves the HIERDENC aspect-level performance. [ABSTRACT FROM AUTHOR]
Copyright of Advances in Data Management is the property of Springer Nature / Books and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
DOI: 10.1007/978-3-642-02190-9_18
Database: Complementary Index