Using Query-Relevant Documents Pairs for Cross-Lingual Information Retrieval.

Bibliographic Details
Title: Using Query-Relevant Documents Pairs for Cross-Lingual Information Retrieval.
Authors: Carbonell, Jaime G., Siekmann, Jörg, Matoušek, Václav, Mautner, Pavel, Pinto, David, Juan, Alfons, Rosso, Paolo
Source: Text, Speech & Dialogue (9783540746270); 2007, p630-637, 8p
Abstract: The world wide web is a natural setting for cross-lingual information retrieval. The European Union is a typical example of a multilingual scenario, where multiple users have to deal with information published in at least 20 languages. Given queries in some source language and a target corpus in another language, the typical approximation consists in translating either the query or the target dataset to the other language. Other approaches use parallel corpora to obtain a statistical dictionary of words among the different languages. In this work, we propose to use a training corpus made up by a set of Query-Relevant Document Pairs (QRDP) in a probabilistic cross-lingual information retrieval approach which is based on the IBM alignment model 1 for statistical machine translation. Our approach has two main advantages over those that use direct translation and parallel corpora: we will not obtain a translation of the query, but a set of associated words which share their meaning in some way and, therefore, the obtained dictionary is, in a broad sense, more semantic than a translation one. Besides, since the queries are supervised, we are working in a more restricted domain than that when using a general parallel corpus (it is well known that in this context results are better than those which are performed in a general context). In order to determine the quality of our experiments, we compared the results with those obtained by a direct translation of the queries with a query translation system, observing promising results. [ABSTRACT FROM AUTHOR]
Copyright of Text, Speech & Dialogue (9783540746270) is the property of Springer eBooks and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
DOI: 10.1007/978-3-540-74628-7_81
Database: Complementary Index