首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
Query suggestion is generally an integrated part of web search engines. In this study, we first redefine and reduce the query suggestion problem as “comparison of queries”. We then propose a general modular framework for query suggestion algorithm development. We also develop new query suggestion algorithms which are used in our proposed framework, exploiting query, session and user features. As a case study, we use query logs of a real educational search engine that targets K-12 students in Turkey. We also exploit educational features (course, grade) in our query suggestion algorithms. We test our framework and algorithms over a set of queries by an experiment and demonstrate a 66–90% statistically significant increase in relevance of query suggestions compared to a baseline method.  相似文献   

2.
This paper proposes a novel query expansion method to improve accuracy of text retrieval systems. Our method makes use of a minimal relevance feedback to expand the initial query with a structured representation composed of weighted pairs of words. Such a structure is obtained from the relevance feedback through a method for pairs of words selection based on the Probabilistic Topic Model. We compared our method with other baseline query expansion schemes and methods. Evaluations performed on TREC-8 demonstrated the effectiveness of the proposed method with respect to the baseline.  相似文献   

3.
Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities.  相似文献   

4.
5.
The anonymization of query logs is an important process that needs to be performed prior to the publication of such sensitive data. This ensures the anonymity of the users in the logs, a problem that has been already found in released logs from well known companies. This paper presents the anonymization of query logs using microaggregation. Our proposal ensures the k-anonymity of the users in the query log, while preserving its utility. We provide the evaluation of our proposal in real query logs, showing the privacy and utility achieved, as well as providing estimations for the use of such data in data mining processes based on clustering.  相似文献   

6.
In this paper, we describe a model of information retrieval system that is based on a document re-ranking method using document clusters. In the first step, we retrieve documents based on the inverted-file method. Next, we analyze the retrieved documents using document clusters, and re-rank them. In this step, we use static clusters and dynamic cluster view. Consequently, we can produce clusters that are tailored to characteristics of the query. We focus on the merits of the inverted-file method and cluster analysis. In other words, we retrieve documents based on the inverted-file method and analyze all terms in document based on the cluster analysis. By these two steps, we can get the retrieved results which are made by the consideration of the context of all terms in a document as well as query terms. We will show that our method achieves significant improvements over the method based on similarity search ranking alone.  相似文献   

7.
Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.  相似文献   

8.
Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.  相似文献   

9.
The study of query performance prediction (QPP) in information retrieval (IR) aims to predict retrieval effectiveness. The specificity of the underlying information need of a query often determines how effectively can a search engine retrieve relevant documents at top ranks. The presence of ambiguous terms makes a query less specific to the sought information need, which in turn may degrade IR effectiveness. In this paper, we propose a novel word embedding based pre-retrieval feature which measures the ambiguity of each query term by estimating how many ‘senses’ each word is associated with. Assuming each sense roughly corresponds to a Gaussian mixture component, our proposed generative model first estimates a Gaussian mixture model (GMM) from the word vectors that are most similar to the given query terms. We then use the posterior probabilities of generating the query terms themselves from this estimated GMM in order to quantify the ambiguity of the query. Previous studies have shown that post-retrieval QPP approaches often outperform pre-retrieval ones because they use additional information from the top ranked documents. To achieve the best of both worlds, we formalize a linear combination of our proposed GMM based pre-retrieval predictor with NQC, a state-of-the-art post-retrieval QPP. Our experiments on the TREC benchmark news and web collections demonstrate that our proposed hybrid QPP approach (in linear combination with NQC) significantly outperforms a range of other existing pre-retrieval approaches in combination with NQC used as baselines.  相似文献   

10.
Modularity in organizations can facilitate the creation and development of dynamic capabilities. Paradoxically, however, modular management can also stifle the strategic potential of such capabilities by conflicting with the horizontal integration of units. We address these issues through an examination of how modular management of information technology (IT), project teams and front-line personnel in concert with knowledge management (KM) interventions influence the creation and development of dynamic capabilities at a large Asia-based call center. Our findings suggest that a full capitalization of the efficiencies created by modularity may be closely linked to the strategic sense making abilities of senior managers to assess the long-term business value of the dominant designs available in the market. Drawing on our analysis we build a modular management-KM-dynamic capabilities model, which highlights the evolution of three different levels of dynamic capabilities and also suggests an inherent complementarity between modular and integrated approaches.  相似文献   

11.
12.
We propose in this paper an architecture for near-duplicate video detection based on: (i) index and query signature based structures integrating temporal and perceptual visual features and (ii) a matching framework computing the logical inference between index and query documents. As far as indexing is concerned, instead of concatenating low-level visual features in high-dimensional spaces which results in curse of dimensionality and redundancy issues, we adopt a perceptual symbolic representation based on color and texture concepts. For matching, we propose to instantiate a retrieval model based on logical inference through the coupling of an N-gram sliding window process and theoretically-sound lattice-based structures. The techniques we cover are robust and insensitive to general video editing and/or degradation, making it ideal for re-broadcasted video search. Experiments are carried out on large quantities of video data collected from the TRECVID 02, 03 and 04 collections and real-world video broadcasts recorded from two German TV stations. An empirical comparison over two state-of-the-art dynamic programming techniques is encouraging and demonstrates the advantage and feasibility of our method.  相似文献   

13.
Dynamic Ensemble Selection (DES) strategy is one of the most common and effective techniques in machine learning to deal with classification problems. DES systems aim to construct an ensemble consisting of the most appropriate classifiers selected from the candidate classifier pool according to the competence level of the individual classifier. Since several classifiers are selected, their combination becomes crucial. However, most of current DES approaches focus on the combination of the selected classifiers while ignoring the local information surrounding the query sample needed to be classified. In order to boost the performance of DES-based classification systems, we in this paper propose a dynamic weighting framework for the classifier fusion during obtaining the final output of an DES system. In particular, the proposed method first employs a DES approach to obtain a group of classifiers for a query sample. Then, the hypothesis vector of the selected ensemble is obtained based on the analysis of consensus. Finally, a distance-based weighting scheme is developed to adjust the hypothesis vector depending on the closeness of the query sample to each class. The proposed method is tested on 30 real-world datasets with six well-known DES approaches based on both homogeneous and heterogeneous ensemble. The obtained results, supported by proper statistical tests, show that our method outperforms, both in terms of accuracy and kappa measures, the original DES framework.  相似文献   

14.
This paper proposes a learning approach for the merging process in multilingual information retrieval (MLIR). To conduct the learning approach, we present a number of features that may influence the MLIR merging process. These features are mainly extracted from three levels: query, document, and translation. After the feature extraction, we then use the FRank ranking algorithm to construct a merge model. To the best of our knowledge, this practice is the first attempt to use a learning-based ranking algorithm to construct a merge model for MLIR merging. In our experiments, three test collections for the task of crosslingual information retrieval (CLIR) in NTCIR3, 4, and 5 are employed to assess the performance of our proposed method. Moreover, several merging methods are also carried out for a comparison, including traditional merging methods, the 2-step merging strategy, and the merging method based on logistic regression. The experimental results show that our proposed method can significantly improve merging quality on two different types of datasets. In addition to the effectiveness, through the merge model generated by FRank, our method can further identify key factors that influence the merging process. This information might provide us more insight and understanding into MLIR merging.  相似文献   

15.
The dynamic nature and size of the Internet can result in difficulty finding relevant information. Most users typically express their information need via short queries to search engines and they often have to physically sift through the search results based on relevance ranking set by the search engines, making the process of relevance judgement time-consuming. In this paper, we describe a novel representation technique which makes use of the Web structure together with summarisation techniques to better represent knowledge in actual Web Documents. We named the proposed technique as Semantic Virtual Document (SVD). We will discuss how the proposed SVD can be used together with a suitable clustering algorithm to achieve an automatic content-based categorization of similar Web Documents. The auto-categorization facility as well as a “Tree-like” Graphical User Interface (GUI) for post-retrieval document browsing enhances the relevance judgement process for Internet users. Furthermore, we will introduce how our cluster-biased automatic query expansion technique can be used to overcome the ambiguity of short queries typically given by users. We will outline our experimental design to evaluate the effectiveness of the proposed SVD for representation and present a prototype called iSEARCH (Intelligent SEarch And Review of Cluster Hierarchy) for Web content mining. Our results confirm, quantify and extend previous research using Web structure and summarisation techniques, introducing novel techniques for knowledge representation to enhance Web content mining.  相似文献   

16.
Query translation is a viable method for cross-language information retrieval (CLIR), but it suffers from translation ambiguities caused by multiple translations of individual query terms. Previous research has employed various methods for disambiguation, including the method of selecting an individual target query term from multiple candidates by comparing their statistical associations with the candidate translations of other query terms. This paper proposes a new method where we examine all combinations of target query term translations corresponding to the source query terms, instead of looking at the candidates for each query term and selecting the best one at a time. The goodness value for a combination of target query terms is computed based on the association value between each pair of the terms in the combination. We tested our method using the NTCIR-3 English–Korean CLIR test collection. The results show some improvements regardless of the association measures we used.  相似文献   

17.
This paper proposes a privacy-preserving consensus algorithm which enables all the agents in the directed network to eventually reach the weighted average of initial states, and while preserving the privacy of the initial state of each agent. A novel privacy-preserving scheme is proposed in our consensus algorithm where initial states are hidden in random values. We also develop detailed analysis based on our algorithm, including its convergence property and the topology condition of privacy leakages for each agent. It can be observed that final consensus point is independent of their initial values that can be arbitrary random values. Besides, when an eavesdropper exists and can intercept the data transmitted on the edges, we introduce an index to measure the privacy leakage degree of agents, and then analyze the degree of privacy leakage for each agent. Similarly, the degree for network privacy leakage is derived. Subsequently, we establish an optimization problem to find the optimal attacking strategy, and present a heuristic optimization algorithm based on the Sequential Least Squares Programming (SLSQP) to solve the proposed optimization problem. Finally, numerical experiments are designed to demonstrate the effectiveness of our algorithm.  相似文献   

18.
We present a novel multimodal query expansion strategy, based on genetic programming (GP), for image search in visually-oriented e-commerce applications. Our GP-based approach aims at both: learning to expand queries with multimodal information and learning to compute the “best” ranking for the expanded queries. However, different from previous work, the query is only expressed in terms of the visual content, which brings several challenges for this type of application. In order to evaluate the effectiveness of our method, we have collected two datasets containing images of clothing products taken from different online shops. Experimental results indicate that our method is an effective alternative for improving the quality of image search results when compared to a genetic programming system based only on visual information. Our method can achieve gains varying from 10.8% against the strongest learning-to-rank baseline to 54% against an adhoc specialized solution for the particular domain at hand.  相似文献   

19.
Focusing on the context of XML retrieval, in this paper we propose a general methodology for managing structured queries (involving both content and structure) within any given structured probabilistic information retrieval system which is able to compute posterior probabilities of relevance for structural components given a non-structured query (involving only query terms but not structural restrictions). We have tested our proposal using two specific information retrieval systems (Garnata and PF/Tijah), and the structured document collections from the last six editions of the INitiative for the Evaluation of XML Retrieval (INEX).  相似文献   

20.
Users of search engines express their needs as queries, typically consisting of a small number of terms. The resulting search engine query logs are valuable resources that can be used to predict how people interact with the search system. In this paper, we introduce two novel applications of query logs, in the context of distributed information retrieval. First, we use query log terms to guide sampling from uncooperative distributed collections. We show that while our sampling strategy is at least as efficient as current methods, it consistently performs better. Second, we propose and evaluate a pruning strategy that uses query log information to eliminate terms. Our experiments show that our proposed pruning method maintains the accuracy achieved by complete indexes, while decreasing the index size by up to 60%. While such pruning may not always be desirable in practice, it provides a useful benchmark against which other pruning strategies can be measured.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号