期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

A domain-independent approach to finding related entities

Olga Vechtomova Stephen E. Robertson 《Information processing & management》2012

We propose an approach to the retrieval of entities that have a specific relationship with the entity given in a query. Our research goal is to investigate whether related entity finding problem can be addressed by combining a measure of relatedness of candidate answer entities to the query, and likelihood that the candidate answer entity belongs to the target entity category specified in the query. An initial list of candidate entities, extracted from top ranked documents retrieved for the query, is refined using a number of statistical and linguistic methods. The proposed method extracts the category of the target entity from the query, identifies instances of this category as seed entities, and computes similarity between candidate and seed entities. The evaluation was conducted on the Related Entity Finding task of the Entity Track of TREC 2010, as well as the QA list questions from TREC 2005 and 2006. Evaluation results demonstrate that the proposed methods are effective in finding related entities. 相似文献

2.

Facet-based opinion retrieval from blogs 总被引：1，自引：0，他引：1

Olga Vechtomova 《Information processing & management》2010,46(1):71-88

The paper presents methods of retrieving blog posts containing opinions about an entity expressed in the query. The methods use a lexicon of subjective words and phrases compiled from manually and automatically developed resources. One of the methods uses the Kullback–Leibler divergence to weight subjective words occurring near query terms in documents, another uses proximity between the occurrences of query terms and subjective words in documents, and the third combines both factors. Methods of structuring queries into facets, facet expansion using Wikipedia, and a facet-based retrieval are also investigated in this work. The methods were evaluated using the TREC 2007 and 2008 Blog track topics, and proved to be highly effective. 相似文献

3.

CIDER: Concept-based image diversification,exploration, and retrieval

Enamul Hoque Orland Hoeber Minglun Gong 《Information processing & management》2013

Many of the approaches to image retrieval on the Web have their basis in text retrieval. However, when searchers are asked to describe their image needs, the resulting query is often short and potentially ambiguous. The solution we propose is to perform automatic query expansion using Wikipedia as the source knowledge base, resulting in a diversification of the search results. The outcome is a broad range of images that represent the various possible interpretations of the query. In order to assist the searcher in finding images that match their specific intentions for the query, we have developed an image organization method that uses both the conceptual information associated with each image, and the visual features extracted from the images. This, coupled with a hierarchical organization of the concepts, provides an interactive interface that takes advantage of the searchers’ abilities to recognize relevant concepts, filter and focus the search results based on these concepts, and visually identify relevant images while navigating within the image space. In this paper, we outline the key features of our image retrieval system (CIDER), and present the results of a preliminary user evaluation. The results of this study illustrate the potential benefits that CIDER can provide for searchers conducting image retrieval tasks. 相似文献

4.

A generic construct based workload model for web search

Jia-Lang Seng I-Feng Ko Binshan Lin 《Information processing & management》2009,45(5):529-554

Benchmarks are vital tools in the performance measurement, evaluation, and comparison of computer hardware and software systems. Standard benchmarks such as the TREC, TPC, SPEC, SAP, Oracle, Microsoft, IBM, Wisconsin, AS³AP, OO1, OO7, XOO7 benchmarks have been used to assess the system performance. These benchmarks are domain-specific and domain-dependent in that they model typical applications and tie to a problem domain. Test results from these benchmarks are estimates of possible system performance for certain pre-determined problem types. When the user domain differs from the standard problem domain or when the application workload is divergent from the standard workload, they do not provide an accurate way to measure the system performance of the user problem domain. System performance of the actual problem domain in terms of data and transactions may vary significantly from the standard benchmarks.In this research, we address the issue of generalization and precision of benchmark workload model for web search technology. The current performance measurement and evaluation method suffers from the rough estimate of system performance which varies widely when the problem domain changes. The performance results provided by the vendors cannot be reproduced nor reused in the real users’ environment. Hence, in this research, we tackle the issue of domain boundness and workload boundness which represents the root of the problem of imprecise, ir-representative, and ir-reproducible performance results. We address the issue by presenting a domain-independent and workload-independent workload model benchmark method which is developed from the perspective of the user requirements and generic constructs. We present a user-driven workload model to develop a benchmark in a process of workload requirements representation, transformation, and generation via the common carrier of generic constructs. We aim to create a more generalized and precise evaluation method which derives test suites from the actual user domain and application setting.The workload model benchmark method comprises three main components. They are a high-level workload specification scheme, a translator of the scheme, and a set of generators to generate the test database and the test suite. They are based on the generic constructs. The specification scheme is used to formalize the workload requirements. The translator is used to transform the specification. The generator is used to produce the test database and the test workload. We determine the generic constructs via the analysis of search methods. The generic constructs form a page model, a query model, and a control model in the workload model development. The page model describes the web page structure. The query model defines the logics to query the web. The control model defines the control variables to set up the experiments.In this study, we have conducted ten baseline research experiments to validate the feasibility and validity of the benchmark method. An experimental prototype is built to execute these experiments. Experimental results demonstrate that the method based on generic constructs and driven by the perspective of user requirements is capable of modeling the standard benchmarks as well as more general benchmark requirements. 相似文献

5.

基于上下文的Web即时信息检索

郭少友《情报理论与实践》2009,32(6)

用户当前正在浏览的网页内容有助于说明用户的即时信息需求.在现有相关研究的基础上提出了一种基于上下文的Web即时信息检索方法,该方法允许用户从正在浏览的网页中选择一段文本作为原始检索条件,由检索系统从其上下文中提取一级扩展词和二级扩展词来形成新的检索条件进行检索,并将检索结果按相似度从大到小的顺序呈现给用户. 相似文献

6.

Linguistic aggregation methods in blog retrieval

Mostafa Keikha Fabio Crestani 《Information processing & management》2012

This paper addresses the blog distillation problem, that is, given a user query find the blogs that are most related to the query topic. We model each post as evidence of the relevance of a blog to the query, and use aggregation methods like Ordered Weighted Averaging (OWA) operators to combine the evidence. We show that using only highly relevant evidence (posts) for each blog can result in an effective retrieval system. We also take into account the importance of the posts in a query-based cluster and investigate its effect in the aggregation results. We use prioritized OWA operators and show that considering the importance is effective when the number of aggregated posts from each blog is high. We carry out our experiments on three different data sets (TREC07, TREC08 and TREC09) and show statistically significant improvements over state of the art model called voting model. 相似文献

7.

Knowledge based collection selection for distributed information retrieval

Baoli Han Ling Chen Xiaoxue Tian 《Information processing & management》2018,54(1):116-128

相似文献

8.

Adapting information retrieval to query contexts 总被引：1，自引：0，他引：1

Jing Bai Jian-Yun Nie 《Information processing & management》2008,44(6):1901

In current IR approaches documents are retrieved only according to the terms specified in the query. The same answers are returned for the same query whatever the user and the search goal are. In reality, many other contextual factors strongly influence document’s relevance and they should be taken into account in IR operations. This paper proposes a method, based on language modeling, to integrate several contextual factors so that document ranking will be adapted to the specific query contexts. We will consider three contextual factors in this paper: the topic domain of the query, the characteristics of the document collection, as well as context words within the query. Each contextual factor is used to generate a new query language model to specify some aspect of the information need. All these query models are then combined together to produce a more complete model for the underlying information need. Our experiments on TREC collections show that each contextual factor can positively influence the IR effectiveness and the combined model results in the highest effectiveness. This study shows that it is both beneficial and feasible to integrate more contextual factors in the current IR practice. 相似文献

9.

Combining the evidence of different relevance feedback methods for information retrieval 总被引：2，自引：0，他引：2

Joon Ho Lee 《Information processing & management》1998,34(6):681-691

It has been known that retrieval effectiveness can be significantly improved by combining multiple evidence from different query or document representations, or multiple retrieval techniques. In this paper, we combine multiple evidence from different relevance feedback methods, and investigate various aspects of the combination. We first generate multiple query vectors for a given information problem in a fully automatic way by expanding an initial query vector with various relevance feedback methods. We then perform retrieval runs for the multiple query vectors, and combine the retrieval results. Experimental results show that combining the evidence of different relevance feedback methods can lead to substantial improvements of retrieval effectiveness. 相似文献

10.

Exploring features for the automatic identification of user goals in web search

Mauro Rojas Herrera Edleno Silva de Moura Marco Cristo Thomaz Philippe Silva Altigran Soares da Silva 《Information processing & management》2010

Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection. 相似文献

11.

A Prospect-Guided global query expansion strategy using word embeddings

Francis C. Fernández-Reyes Jorge Hermosillo-Valadez Manuel Montes-y-Gómez 《Information processing & management》2018,54(1):1-13

The effectiveness of query expansion methods depends essentially on identifying good candidates, or prospects, semantically related to query terms. Word embeddings have been used recently in an attempt to address this problem. Nevertheless query disambiguation is still necessary as the semantic relatedness of each word in the corpus is modeled, but choosing the right terms for expansion from the standpoint of the un-modeled query semantics remains an open issue. In this paper we propose a novel query expansion method using word embeddings that models the global query semantics from the standpoint of prospect vocabulary terms. The proposed method allows to explore query-vocabulary semantic closeness in such a way that new terms, semantically related to more relevant topics, are elicited and added in function of the query as a whole. The method includes candidates pooling strategies that address disambiguation issues without using exogenous resources. We tested our method with three topic sets over CLEF corpora and compared it across different Information Retrieval models and against another expansion technique using word embeddings as well. Our experiments indicate that our method achieves significant results that outperform the baselines, improving both recall and precision metrics without relevance feedback. 相似文献

12.

Efficient query-by-example spoken document retrieval combining phone multigram representation and dynamic time warping

Paula Lopez-Otero Javier Parapar Alvaro Barreiro 《Information processing & management》2019,56(1):43-60

相似文献

13.

IntoNews: Online news retrieval using closed captions

Roi Blanco Gianmarco De Francisci Morales Fabrizio Silvestri 《Information processing & management》2015

We present IntoNews, a system to match online news articles with spoken news from a television newscasts represented by closed captions. We formalize the news matching problem as two independent tasks: closed captions segmentation and news retrieval. The system segments closed captions by using a windowing scheme: sliding or tumbling window. Next, it uses each segment to build a query by extracting representative terms. The query is used to retrieve previously indexed news articles from a search engine. To detect when a new article should be surfaced, the system compares the set of retrieved articles with the previously retrieved one. The intuition is that if the difference between these sets is large enough, it is likely that the topic of the newscast currently on air has changed and a new article should be displayed to the user. In order to evaluate IntoNews, we build a test collection using data coming from a second screen application and a major online news aggregator. The dataset is manually segmented and annotated by expert assessors, and used as our ground truth. It is freely available for download through the Webscope program.¹ Our evaluation is based on a set of novel time-relevance metrics that take into account three different aspects of the problem at hand: precision, timeliness and coverage. We compare our algorithms against the best method previously proposed in literature for this problem. Experiments show the trade-offs involved among precision, timeliness and coverage of the airing news. Our best method is four times more accurate than the baseline. 相似文献

14.

Cardinality estimation in numeric on-line databases

Kalervo Jrvelin 《Information processing & management》1986,22(6)

Numeric on-line databases (NDBs) have become essential in information retrieval (IR). NDBs differ from traditional bibliographic databases (BDBs) with respect to their content, structural complexity, data manipulation capabilities and the complexity of the user interfaces and user charging schemes. Recent trends in user charging for all on-line IR are toward charging for the information actually retrieved from the database rather than for the connect-time. However, the viability of such charging schemes depends on the user's possibilities of estimating the charges in advance, during the query negotiating phase. The problem of advance charge estimation in NDBs is addressed in this paper. In order to solve this problem, a method for estimating the numbers of data items satisfying the query expressions, i.e. the query cardinalities, is required. In this paper, an approach for advance estimation of query charges is developed and, based on it, a systematic and general method for query cardinality estimation is defined. The approach and the method are based on the well-known relational data model (RDM). The method is adapted to the new application area, user charge estimation in NDBs, and provides several improvements with respect to earlier cardinality estimation methods. Based on the method, several types of user charges can be estimated in advance. Tools based on the method are necessary components of query interfaces to NDBs if non-connect-timebased charging is used. The approach and the method are directly applicable to RDM-based NDBs. 相似文献

15.

Summarisation of the logical structure of XML documents

Zoltán Szlávik Anastasios Tombros Mounia Lalmas 《Information processing & management》2012

Summarisation is traditionally used to produce summaries of the textual contents of documents. In this paper, it is argued that summarisation methods can also be applied to the logical structure of XML documents. Structure summarisation selects the most important elements of the logical structure and ensures that the user’s attention is focused towards sections, subsections, etc. that are believed to be of particular interest. Structure summaries are shown to users as hierarchical tables of contents. This paper discusses methods for structure summarisation that use various features of XML elements in order to select document portions that a user’s attention should be focused to. An evaluation methodology for structure summarisation is also introduced and summarisation results using various summariser versions are presented and compared to one another. We show that data sets used in information retrieval evaluation can be used effectively in order to produce high quality (query independent) structure summaries. We also discuss the choice and effectiveness of particular summariser features with respect to several evaluation measures. 相似文献

16.

Geo-temporal distribution of tag terms for event-related image retrieval

Massimiliano Ruocco Heri Ramampiaro 《Information processing & management》2015

Media sharing applications, such as Flickr and Panoramio, contain a large amount of pictures related to real life events. For this reason, the development of effective methods to retrieve these pictures is important, but still a challenging task. Recognizing this importance, and to improve the retrieval effectiveness of tag-based event retrieval systems, we propose a new method to extract a set of geographical tag features from raw geo-spatial profiles of user tags. The main idea is to use these features to select the best expansion terms in a machine learning-based query expansion approach. Specifically, we apply rigorous statistical exploratory analysis of spatial point patterns to extract the geo-spatial features. We use the features both to summarize the spatial characteristics of the spatial distribution of a single term, and to determine the similarity between the spatial profiles of two terms – i.e., term-to-term spatial similarity. To further improve our approach, we investigate the effect of combining our geo-spatial features with temporal features on choosing the expansion terms. To evaluate our method, we perform several experiments, including well-known feature analyzes. Such analyzes show how much our proposed geo-spatial features contribute to improve the overall retrieval performance. The results from our experiments demonstrate the effectiveness and viability of our method. 相似文献

17.

Noun phrases as building blocks for cross-language Search Assistance

《Information processing & management》2005,41(3):549-568

This paper presents a Foreign-Language Search Assistant that uses noun phrases as fundamental units for document translation and query formulation, translation and refinement. The system (a) supports the foreign-language document selection task providing a cross-language indicative summary based on noun phrase translations, and (b) supports query formulation and refinement using the information displayed in the cross-language document summaries. Our results challenge two implicit assumptions in most of cross-language Information Retrieval research: first, that once documents in the target language are found, Machine Translation is the optimal way of informing the user about their contents; and second, that in an interactive setting the optimal way of formulating and refining the query is helping the user to choose appropriate translations for the query terms. 相似文献

18.

Using query expansion in graph-based approach for query-focused multi-document summarization

Lin Zhao Lide Wu Xuanjing Huang 《Information processing & management》2009

This paper presents a novel query expansion method, which is combined in the graph-based algorithm for query-focused multi-document summarization, so as to resolve the problem of information limit in the original query. Our approach makes use of both the sentence-to-sentence relations and the sentence-to-word relations to select the query biased informative words from the document set and use them as query expansions to improve the sentence ranking result. Compared to previous query expansion approaches, our approach can capture more relevant information with less noise. We performed experiments on the data of document understanding conference (DUC) 2005 and DUC 2006, and the evaluation results show that the proposed query expansion method can significantly improve the system performance and make our system comparable to the state-of-the-art systems. 相似文献

19.

检索语言可用性评价初探

赖茂生王婧麦晓华《情报理论与实践》2012,35(8):65-69,96

文章旨在探讨和构建检索语言的可用性评价及其指标。通过调研现有检索语言评价和可用性相关的研究,发现目前检索语言评价研究比较分散,过于强调检索效果,并依附于检索系统评价。根据检索语言和可用性评价的特点,初步构建了检索语言的可用性评价指标体系,然后运用专家调查法对该指标体系进行优化完善,利用Matlab进行层次分析以确定各指标的权重。研究结果有利于检索语言在网络环境下更好地发挥其功能,提升效率和用户满意度。相似文献

20.

Detecting verbose queries and improving information retrieval

Emanuele Di Buccio Massimo MelucciFederica Moro 《Information processing & management》2014

Although most of the queries submitted to search engines are composed of a few keywords and have a length that ranges from three to six words, more than 15% of the total volume of the queries are verbose, introduce ambiguity and cause topic drifts. We consider verbosity a different property of queries from length since a verbose query is not necessarily long, it might be succinct and a short query might be verbose. This paper proposes a methodology to automatically detect verbose queries and conditionally modify queries. The methodology proposed in this paper exploits state-of-the-art classification algorithms, combines concepts from a large linguistic database and uses a topic gisting algorithm we designed for verbose query modification purposes. Our experimental results have been obtained using the TREC Robust track collection, thirty topics classified by difficulty degree, four queries per topic classified by verbosity and length, and human assessment of query verbosity. Our results suggest that the methodology for query modification conditioned to query verbosity detection and topic gisting is significantly effective and that query modification should be refined when topic difficulty and query verbosity are considered since these two properties interact and query verbosity is not straightforwardly related to query length. 相似文献