期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A comparison of two methods for boolean query relevancy feedback†

G. Salton E. Voorhees 《Information processing & management》1984,20(5-6)

The relevance feedback process uses information derived from an initially retrieved set of documents to improve subsequent search formulations and retrieval output. In a Boolean query environment this implies that new query terms must be identified and Boolean operators must be chosen automatically to connect the various query terms. In this study two recently proposed automatic methods for relevance feedback of Boolean queries are evaluated and conclusions are drawn concerning the use of effective feedback methods in a Boolean query environment. 相似文献

2.

Enhancing query translation with relevance feedback in translingual information retrieval

Daqing He Dan Wu 《Information processing & management》2011

As an effective technique for improving retrieval effectiveness, relevance feedback (RF) has been widely studied in both monolingual and translingual information retrieval (TLIR). The studies of RF in TLIR have been focused on query expansion (QE), in which queries are reformulated before and/or after they are translated. However, RF in TLIR actually not only can help select better query terms, but also can enhance query translation by adjusting translation probabilities and even resolving some out-of-vocabulary terms. In this paper, we propose a novel relevance feedback method called translation enhancement (TE), which uses the extracted translation relationships from relevant documents to revise the translation probabilities of query terms and to identify extra available translation alternatives so that the translated queries are more tuned to the current search. We studied TE using pseudo-relevance feedback (PRF) and interactive relevance feedback (IRF). Our results show that TE can significantly improve TLIR with both types of relevance feedback methods, and that the improvement is comparable to that of query expansion. More importantly, the effects of translation enhancement and query expansion are complementary. Their integration can produce further improvement, and makes TLIR more robust for a variety of queries. 相似文献

3.

Improving the learning of Boolean queries by means of a multiobjective IQBE evolutionary algorithm

O. Cordón E. Herrera-Viedma M. Luque 《Information processing & management》2006

The Inductive Query By Example (IQBE) paradigm allows a system to automatically derive queries for a specific Information Retrieval System (IRS). Classic IRSs based on this paradigm [Smith, M., & Smith, M. (1997). The use of genetic programming to build Boolean queries for text retrieval through relevance feedback. Journal of Information Science, 23(6), 423–431] generate a single solution (Boolean query) in each run, that with the best fitness value, which is usually based on a weighted combination of the basic performance criteria, precision and recall. 相似文献

4.

Learning from homologous queries and semantically related terms for query auto completion

《Information processing & management》2016,52(4):628-643

Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today’s QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today’s QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries.We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models. 相似文献

5.

A query term re-weighting approach using document similarity

《Information processing & management》2016,52(3):478-489

Pseudo-relevance feedback is the basis of a category of automatic query modification techniques. Pseudo-relevance feedback methods assume the initial retrieved set of documents to be relevant. Then they use these documents to extract more relevant terms for the query or just re-weigh the user's original query. In this paper, we propose a straightforward, yet effective use of pseudo-relevance feedback method in detecting more informative query terms and re-weighting them. The query-by-query analysis of our results indicates that our method is capable of identifying the most important keywords even in short queries. Our main idea is that some of the top documents may contain a closer context to the user's information need than the others. Therefore, re-examining the similarity of those top documents and weighting this set based on their context could help in identifying and re-weighting informative query terms. Our experimental results in standard English and Persian test collections show that our method improves retrieval performance, in terms of MAP criterion, up to 7% over traditional query term re-weighting methods. 相似文献

6.

Re-examining the effects of adding relevance information in a relevance feedback environment

W.S. Wong R.W.P. Luk H.V. Leong K.S. Ho D.L. Lee 《Information processing & management》2008

This paper presents an investigation about how to automatically formulate effective queries using full or partial relevance information (i.e., the terms that are in relevant documents) in the context of relevance feedback (RF). The effects of adding relevance information in the RF environment are studied via controlled experiments. The conditions of these controlled experiments are formalized into a set of assumptions that form the framework of our study. This framework is called idealized relevance feedback (IRF) framework. In our IRF settings, we confirm the previous findings of relevance feedback studies. In addition, our experiments show that better retrieval effectiveness can be obtained when (i) we normalize the term weights by their ranks, (ii) we select weighted terms in the top K retrieved documents, (iii) we include terms in the initial title queries, and (iv) we use the best query sizes for each topic instead of the average best query size where they produce at most five percentage points improvement in the mean average precision (MAP) value. We have also achieved a new level of retrieval effectiveness which is about 55–60% MAP instead of 40+% in the previous findings. This new level of retrieval effectiveness was found to be similar to a level using a TREC ad hoc test collection that is about double the number of documents in the TREC-3 test collection used in previous works. 相似文献

7.

A Pseudo-relevance feedback framework combining relevance matching and semantic matching for information retrieval

《Information processing & management》2020,57(6):102342

Pseudo-relevance feedback (PRF) is a well-known method for addressing the mismatch between query intention and query representation. Most current PRF methods consider relevance matching only from the perspective of terms used to sort feedback documents, thus possibly leading to a semantic gap between query representation and document representation. In this work, a PRF framework that combines relevance matching and semantic matching is proposed to improve the quality of feedback documents. Specifically, in the first round of retrieval, we propose a reranking mechanism in which the information of the exact terms and the semantic similarity between the query and document representations are calculated by bidirectional encoder representations from transformers (BERT); this mechanism reduces the text semantic gap by using the semantic information and improves the quality of feedback documents. Then, our proposed PRF framework is constructed to process the results of the first round of retrieval by using probability-based PRF methods and language-model-based PRF methods. Finally, we conduct extensive experiments on four Text Retrieval Conference (TREC) datasets. The results show that the proposed models outperform the robust baseline models in terms of the mean average precision (MAP) and precision P at position 10 (P@10), and the results also highlight that using the combined relevance matching and semantic matching method is more effective than using relevance matching or semantic matching alone in terms of improving the quality of feedback documents. 相似文献

8.

Structured queries,language modeling,and relevance modeling in cross-language information retrieval

《Information processing & management》2005,41(3):457-473

Two probabilistic approaches to cross-lingual retrieval are in wide use today, those based on probabilistic models of relevance, as exemplified by INQUERY, and those based on language modeling. INQUERY, as a query net model, allows the easy incorporation of query operators, including a synonym operator, which has proven to be extremely useful in cross-language information retrieval (CLIR), in an approach often called structured query translation. In contrast, language models incorporate translation probabilities into a unified framework. We compare the two approaches on Arabic and Spanish data sets, using two kinds of bilingual dictionaries––one derived from a conventional dictionary, and one derived from a parallel corpus. We find that structured query processing gives slightly better results when queries are not expanded. On the other hand, when queries are expanded, language modeling gives better results, but only when using a probabilistic dictionary derived from a parallel corpus.We pursue two additional issues inherent in the comparison of structured query processing with language modeling. The first concerns query expansion, and the second is the role of translation probabilities. We compare conventional expansion techniques (pseudo-relevance feedback) with relevance modeling, a new IR approach which fits into the formal framework of language modeling. We find that relevance modeling and pseudo-relevance feedback achieve comparable levels of retrieval and that good translation probabilities confer a small but significant advantage. 相似文献

9.

Managing structured queries in probabilistic XML retrieval systems

Luis M. de Campos Juan M. Fernández-Luna Juan F. Huete Carlos Martín-Dancausa 《Information processing & management》2010

Focusing on the context of XML retrieval, in this paper we propose a general methodology for managing structured queries (involving both content and structure) within any given structured probabilistic information retrieval system which is able to compute posterior probabilities of relevance for structural components given a non-structured query (involving only query terms but not structural restrictions). We have tested our proposal using two specific information retrieval systems (Garnata and PF/Tijah), and the structured document collections from the last six editions of the INitiative for the Evaluation of XML Retrieval (INEX). 相似文献

10.

提问自动扩展算法——词为基础概念学习法

马巍《情报科学》2006,24(7):1066-1068

本文介绍了用以词为基础的概念学习法来自动扩展提问式的算法，该算法通过学习出现在当前提问中的概念描述词来逐词扩展提问。实验表明，与传统的向量空间检索模型及相关反馈算法相比，本算法能大大提高查全率和查准率。该方法可用于数字图书馆和WWW等的检索中。相似文献

11.

Interpolation of the extended Boolean retrieval model

《Information processing & management》2002,38(6):743-748

An interpolation theorem for the p-norm model, 1⩽p⩽∞, of Salton, Fox, and Wu for extended Boolean document retrieval is stated and proven. This result asserts roughly that whenever two or more documents are similarly ranked at any two points along the p-continuum with respect to this model for either an AND or an OR query containing exactly two terms, then they are similarly ranked at all points in between. An analogous result can fail for queries with more than two terms and an example is given to show this. 相似文献

12.

Parsimonious translation models for information retrieval

Seung-Hoon Na In-Su KangJong-Hyeok Lee 《Information processing & management》2007

In the KL divergence framework, the extended language modeling approach has a critical problem of estimating a query model, which is the probabilistic model that encodes the user’s information need. For query expansion in initial retrieval, the translation model had been proposed to involve term co-occurrence statistics. However, the translation model was difficult to apply, because the term co-occurrence statistics must be constructed in the offline time. Especially in a large collection, constructing such a large matrix of term co-occurrences statistics prohibitively increases time and space complexity. In addition, reliable retrieval performance cannot be guaranteed because the translation model may comprise noisy non-topical terms in documents. To resolve these problems, this paper investigates an effective method to construct co-occurrence statistics and eliminate noisy terms by employing a parsimonious translation model. The parsimonious translation model is a compact version of a translation model that can reduce the number of terms containing non-zero probabilities by eliminating non-topical terms in documents. Through experimentation on seven different test collections, we show that the query model estimated from the parsimonious translation model significantly outperforms not only the baseline language modeling, but also the non-parsimonious models. 相似文献

13.

A theoretical framework for defining similarity measures for boolean search request formulations, including some experimental results

Tadeusz Radecki 《Information processing & management》1985,21(6):501-524

Clusters of queries submitted to a given information retrieval system can be used as a basis for an effective method of clustering documents. This indirect procedure of document clustering requires the availability of a similarity measure for queries. Research carried out along these lines has resulted in the development of some methodologies for estimating such query similarities, applicable both in the case of queries characterized by sets of weighted or unweighted index terms and in the case of queries represented by Boolean combinations of index terms. This paper reports the results of further research by the author into a methodology of the latter type, i.e. a methodology for determining the similarity between queries characterized by Boolean search request formulations. The novelty of the presented approach, as compared with the methodology introduced in an earlier paper by the author, is that some relations among index terms are now taken into account. A number of similarity measures for Boolean combinations of index terms are discussed here in some detail. The rationale behind these measures is outlined, and the conditions to be met for ensuring their equivalence are identified. Moreover, the results of an experiment concerning two of the similarity measures introduced are given. 相似文献

14.

New query suggestion framework and algorithms: A case study for an educational search engine

《Information processing & management》2016,52(5):733-752

Query suggestion is generally an integrated part of web search engines. In this study, we first redefine and reduce the query suggestion problem as “comparison of queries”. We then propose a general modular framework for query suggestion algorithm development. We also develop new query suggestion algorithms which are used in our proposed framework, exploiting query, session and user features. As a case study, we use query logs of a real educational search engine that targets K-12 students in Turkey. We also exploit educational features (course, grade) in our query suggestion algorithms. We test our framework and algorithms over a set of queries by an experiment and demonstrate a 66–90% statistically significant increase in relevance of query suggestions compared to a baseline method. 相似文献

15.

Mix and match: combining terms and operators for successful Web searches

《Information processing & management》2005,41(4):801-817

This paper presents a detailed analysis of the structure and components of queries written by experimental participants in a study that manipulated two factors found to affect end-user information retrieval performance: training in Boolean logic and the type of search interface. As reported previously, we found that both Boolean training and the use of an assisted interface improved the participants' ability to find correct responses to information requests. Here, we examine the impact of these training and interface manipulations on the Boolean operators and search terms that comprise the submitted queries. Our analysis shows that both Boolean training and the use of an assisted interface improved the participants' ability to correctly utilize various operators. An unexpected finding is that this training also had a positive impact on term selection. The terms and, to a lesser extent, the operators comprising a query were important factors affecting the participants' performance in query tasks. Our findings demonstrate that even small training interventions can improve the users' search performance and highlight the need for additional information retrieval research into how search interfaces can provide superior support to today's untrained users of the Web. 相似文献

16.

In search of query patterns: A case study of a university OPAC

Eng Pwey LauAuthor Vitae Dion Hoe-Lian Goh 《Information processing & management》2006

A transaction log analysis of the Nanyang Technological University (NTU) OPAC was conducted to identify query and search failure patterns with the goal of identifying areas of improvement for the system. One semester’s worth of OPAC transaction logs were obtained and from these, 641,991 queries were extracted and used for this work. Issues investigated included query length, frequency and type of search options and Boolean operators used as well as their relationships with search failure. Among other findings, results indicate that a majority of the queries were simple, with short query lengths and a low usage of Boolean operators. Failure analysis revealed that on average, users had an almost equal chance of obtaining no records or at least one record to a submitted query. We propose enhancements and suggest future areas of work to improve the users’ search experience with the NTU OPAC. 相似文献

17.

Bias–variance analysis in estimating true query model for information retrieval

Peng Zhang Dawei Song Jun Wang Yuexian Hou 《Information processing & management》2014

The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias–variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias–variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias–variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias–variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling. 相似文献

18.

Applying query structuring in cross-language retrieval

《Information processing & management》2003,39(3):391-402

We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results. 相似文献

19.

How doctors search: A study of query behaviour and the impact on search results

Marianne Lykke Susan Price Lois Delcambre 《Information processing & management》2012

Professional, workplace searching is different from general searching, because it is typically limited to specific facets and targeted to a single answer. We have developed the semantic component (SC) model, which is a search feature that allows searchers to structure and specify the search to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome. In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by the well-known vocabulary problem. The problem was exacerbated by using certain filters as Boolean filters. The best working queries were structured into 2–3 main facets out of 3–5 possible search facets, and expressed with terms reflecting the focal view of the search task. The findings at the same time support and extend previous results about query structure and exhaustivity showing the importance of selecting central search facets and express them from the perspective of search task. The SC model was applied in the highest performing queries except one. The findings suggest that the model might be a helpful feature to structure queries into central, appropriate facets, and in returning highly relevant documents. 相似文献

20.

A Prospect-Guided global query expansion strategy using word embeddings

Francis C. Fernández-Reyes Jorge Hermosillo-Valadez Manuel Montes-y-Gómez 《Information processing & management》2018,54(1):1-13

The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but choosing the right terms for expansion from the standpoint of the un-modeled query semantics remains an open issue. In this paper we propose a novel query expansion method using word embeddings that models the global query semantics from the standpoint of prospect vocabulary terms. The proposed method allows to explore query-vocabulary semantic closeness in such a way that new terms, semantically related to more relevant topics, are elicited and added in function of the query as a whole. The method includes candidates pooling strategies that address disambiguation issues without using exogenous resources. We tested our method with three topic sets over CLEF corpora and compared it across different Information Retrieval models and against another expansion technique using word embeddings as well. Our experiments indicate that our method achieves significant results that outperform the baselines, improving both recall and precision metrics without relevance feedback. 相似文献