首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 246 毫秒
1.
In order to evaluate the effectiveness of Information Retrieval (IR) systems it is key to collect relevance judgments from human assessors. Crowdsourcing has successfully been used as a method to scale-up the collection of manual relevance judgments, and previous research has investigated the impact of different judgment task design elements (e.g., highlighting query keywords in the document) on judgment quality and efficiency. In this work we investigate the positive and negative impacts of presenting crowd human assessors with more than just the topic and the document to be judged. We deploy different variants of crowdsourced relevance judgment tasks following a between-subjects design in which we present different types of metadata to the human assessor. Specifically, we investigate the effect of human metadata (e.g., what other human assessors think of the current document, as in which relevance level has already been selected by the majority crowd workers), machine metadata (e.g., how IR systems scored this document such as its average position in ranked lists, statistics about the document such as term frequencies). We look at the impact of metadata on judgment quality (i.e., the level of agreement with trained assessors) and cost (i.e., the time it takes for workers to complete the judgments) as well as at how metadata quality positively or negatively impact the collected judgments.  相似文献   

2.
A proposed particle swarm classifier has been integrated with the concept of intelligently controlling the search process of PSO to develop an efficient swarm intelligence based classifier, which is called intelligent particle swarm classifier (IPS-classifier). This classifier is described to find the decision hyperplanes to classify patterns of different classes in the feature space. An intelligent fuzzy controller is designed to improve the performance and efficiency of the proposed classifier by adapting three important parameters of PSO (inertia weight, cognitive parameter and social parameter). Three pattern recognition problems with different feature vector dimensions are used to demonstrate the effectiveness of the introduced classifier: Iris data classification, Wine data classification and radar targets classification from backscattered signals. The experimental results show that the performance of the IPS-classifier is comparable to or better than the k-nearest neighbor (k-NN) and multi-layer perceptron (MLP) classifiers, which are two conventional classifiers.  相似文献   

3.
This paper presents a laboratory based evaluation study of cross-language information retrieval technologies, utilizing partially parallel test collections, NTCIR-2 (used together with NTCIR-1), where Japanese–English parallel document collections, parallel topic sets and their relevance judgments are available. These enable us to observe and compare monolingual retrieval processes in two languages as well as retrieval across languages. Our experiments focused on (1) the Rosetta stone question (whether a partially parallel collection helps in cross-language information access or not?) and (2) two aspects of retrieval difficulties namely “collection discrepancy” and “query discrepancy”. Japanese and English monolingual retrieval systems are combined by dictionary based query translation modules so that a symmetrical bilingual evaluation environment is implemented.  相似文献   

4.
Term weighting for document ranking and retrieval has been an important research topic in information retrieval for decades. We propose a novel term weighting method based on a hypothesis that a term’s role in accumulated retrieval sessions in the past affects its general importance regardless. It utilizes availability of past retrieval results consisting of the queries that contain a particular term, retrieved documents, and their relevance judgments. A term’s evidential weight, as we propose in this paper, depends on the degree to which the mean frequency values for the relevant and non-relevant document distributions in the past are different. More precisely, it takes into account the rankings and similarity values of the relevant and non-relevant documents. Our experimental result using standard test collections shows that the proposed term weighting scheme improves conventional TF*IDF and language model based schemes. It indicates that evidential term weights bring in a new aspect of term importance and complement the collection statistics based on TF*IDF. We also show how the proposed term weighting scheme based on the notion of evidential weights are related to the well-known weighting schemes based on language modeling and probabilistic models.  相似文献   

5.
The estimation of query model is an important task in language modeling (LM) approaches to information retrieval (IR). The ideal estimation is expected to be not only effective in terms of high mean retrieval performance over all queries, but also stable in terms of low variance of retrieval performance across different queries. In practice, however, improving effectiveness can sacrifice stability, and vice versa. In this paper, we propose to study this tradeoff from a new perspective, i.e., the bias–variance tradeoff, which is a fundamental theory in statistics. We formulate the notion of bias–variance regarding retrieval performance and estimation quality of query models. We then investigate several estimated query models, by analyzing when and why the bias–variance tradeoff will occur, and how the bias and variance can be reduced simultaneously. A series of experiments on four TREC collections have been conducted to systematically evaluate our bias–variance analysis. Our approach and results will potentially form an analysis framework and a novel evaluation strategy for query language modeling.  相似文献   

6.
With the emergence and development of deep generative models, such as the variational auto-encoders (VAEs), the research on topic modeling successfully extends to a new area: neural topic modeling, which aims to learn disentangled topics to understand the data better. However, the original VAE framework had been shown to be limited in disentanglement performance, bringing their inherent defects to a neural topic model (NTM). In this paper, we put forward that the optimization objectives of contrastive learning are consistent with two important goals (alignment and uniformity) of well-disentangled topic learning. Also, the optimization objectives of contrastive learning are consistent with two key evaluation measures for topic models, topic coherence and topic diversity. So, we come to the important conclusion that alignment and uniformity of disentangled topic learning can be quantified with topic coherence and topic diversity. Accordingly, we are inspired to propose the Contrastive Disentangled Neural Topic Model (CNTM). By representing both words and topics as low-dimensional vectors in the same embedding space, we apply contrastive learning to neural topic modeling to produce factorized and disentangled topics in an interpretable manner. We compare our proposed CNTM with strong baseline models on widely-used metrics. Our model achieves the best topic coherence scores under the most general evaluation setting (100% proportion topic selected) with 25.0%, 10.9%, 24.6%, and 51.3% improvements above the second-best models’ scores reported on four datasets of 20 Newsgroups, Web Snippets, Tag My News, and Reuters, respectively. Our method also gets the second-best topic diversity scores on the dataset of 20Newsgroups and Web Snippets. Our experimental results show that CNTM can effectively leverage the disentanglement ability from contrastive learning to solve the inherent defect of neural topic modeling and obtain better topic quality.  相似文献   

7.
Traditional Cranfield test collections represent an abstraction of a retrieval task that Sparck Jones calls the “core competency” of retrieval: a task that is necessary, but not sufficient, for user retrieval tasks. The abstraction facilitates research by controlling for (some) sources of variability, thus increasing the power of experiments that compare system effectiveness while reducing their cost. However, even within the highly-abstracted case of the Cranfield paradigm, meta-analysis demonstrates that the user/topic effect is greater than the system effect, so experiments must include a relatively large number of topics to distinguish systems’ effectiveness. The evidence further suggests that changing the abstraction slightly to include just a bit more characterization of the user will result in a dramatic loss of power or increase in cost of retrieval experiments. Defining a new, feasible abstraction for supporting adaptive IR research will require winnowing the list of all possible factors that can affect retrieval behavior to a minimum number of essential factors.  相似文献   

8.
This paper presents a relevance model to rank the facts of a data warehouse that are described in a set of documents retrieved with an information retrieval (IR) query. The model is based in language modeling and relevance modeling techniques. We estimate the relevance of the facts by the probability of finding their dimensions values and the query keywords in the documents that are relevant to the query. The model is the core of the so-called contextualized warehouse, which is a new kind of decision support system that combines structured data sources and document collections. The paper evaluates the relevance model with the Wall Street Journal (WSJ) TREC test subcollection and a self-constructed fact database.  相似文献   

9.
A growing body of studies is developing approaches to evaluating human interaction with Web search engines, including the usability and effectiveness of Web search tools. This study explores a user-centered approach to the evaluation of the Web search engine Inquirus – a Web meta-search tool developed by researchers from the NEC Research Institute. The goal of the study reported in this paper was to develop a user-centered approach to the evaluation including: (1) effectiveness: based on the impact of users' interactions on their information problem and information seeking stage, and (2) usability: including screen layout and system capabilities for users. Twenty-two volunteers searched Inquirus on their own personal information topics. Data analyzed included: (1) user pre- and post-search questionnaires and (2) Inquirus search transaction logs. Key findings include: (1) Inquirus was rated highly by users on various usability measures, (2) all users experienced some level of shift/change in their information problem, information seeking, and personal knowledge due to their Inquirus interaction, (3) different users experienced different levels of change/shift, and (4) the search measure precision did not correlate with other user-based measures. Some users experienced major changes/shifts in various user-based variables, such as information problem or information seeking stage with a search of low precision and vice versa. Implications for the development of user-centered approaches to the evaluation of Web and information retrieval (IR) systems and further research are discussed.  相似文献   

10.
The large amount of information available and the difficulty on processing it has made knowledge management a promising area of research. Several topics are related to it, for example distributed and intelligent information retrieval, information filtering and information evaluation, which became crucial. In this paper, we focus our attention on the knowledge evaluation problem. With the aim of evaluating information coded in the standard non-proprietary format SGML (as also in XML), we propose some evaluation methods based on L-grammars which are fuzzy grammars. In particular we apply these methods to the evaluation of documents in SGML-format and to the evaluation of HTML-pages in the World Wide Web. L-grammars generate recursively enumerable L-languages, as it has been proved in Gerla ((1991), Information Sciences 53), and so they can be used to generate fuzzy languages based on extensions of the document type definitions (DTD) involved by SGML. Given a DTD, we extend its associated language by adding a judgement label. By selecting a particular label and by taking the start symbol of the grammar associated to the DTD, we can generate any DTD-compliant document with a fuzzy degree of membership derived from the judgement label. In this way we fit the computational model underlying the recursively enumerable L-languages to the process of collecting different evaluations of the same document. Finally, we outline how the generalization of these methods of evaluation can be applied in different contexts and for different roles, as for example for information filtering.  相似文献   

11.
In this paper results from three studies examining 1295 relevance judgments by 36 information retrieval (IR) system end-users is reported. Both the region of the relevance judgments, from non-relevant to highly relevant, and the motivations or levels for the relevance judgments are examined. Three major findings are studied. First, the frequency distributions of relevance judgments by IR system end-users tend to take on a bi-modal shape with peaks at the extremes (non-relevant/relevant) with a flatter middle range. Second, the different type of scale (interval or ordinal) used in each study did not alter the shape of the relevance frequency distributions. And third, on an interval scale, the median point of relevance judgment distributions correlates with the point where relevant and partially relevant items begin to be retrieved. The median point of a distribution of relevance judgments may provide a measure of user/IR system interaction to supplement precision/recall measures. The implications of investigation for relevance theory and IR systems evaluation are discussed.  相似文献   

12.
Re-using research resources is essential for advancing knowledge and developing repeatable, empirically solid experiments in scientific fields, including interactive information retrieval (IIR). Despite recent efforts on standardizing research re-use and documentation, how to quantitatively measure the reusability of IIR resources still remains an open challenge. Inspired by the reusability evaluations on Cranfield experiments, our work proactively explores the problem of measuring IIR test collection reusability and makes threefold contributions: (1) constructing a novel usefulness-oriented framework with specific analytical methods for evaluating the reusability of IIR test collections consisting of query sets, document/page sets, and sets of task-document usefulness (tuse); (2) explaining the potential impacts of varying IIR-specific factors (e.g. search tasks, sessions, user characteristics) on test collection reusability; (3) proposing actionable methods for building reusable test collections in IIR and thereby amortizing the true cost of user-oriented evaluations. The Cranfield-inspired reusability assessment framework serves as an initial step towards accurately evaluating the reusability of IIR research resources and measuring the reproducibility of IIR evaluation results. It also demonstrates an innovative approach to integrating the insights from individual heterogeneous user studies with the evaluation techniques developed in standardized ad hoc retrieval experiments, which will facilitate the maturation of IIR fields and eventually benefits both sides of research.  相似文献   

13.
[目的/意义]探索论文被引次数是否和论文内容即概念组合方式有关。[方法/过程]选取WoS数据库中的免疫学科,抽取其中高、中、低被引频次三种论文集合的主题词,分析各集合主题词频次分布的集中离散趋势。分别构建主题词共现网络,通过网络拓扑属性的分析,了解三种论文集合在概念组合方式上的异同,衡量非典型组合与新颖性的关系。[结果/结论](1)不同被引频次的文献集合在主题类型的分布和主题词分散程度上有较大差异。(2)高被引和中被引论文集的主题词共现网络具有小世界性,低被引论文集的主题词网络不具有小世界性。(3)高被引论文集的主题词共现网络比较紧密,且主题词非典型组合的比例要高于其他两种论文集。低被引论文集的主题词网络比较松散。论文的被引次数与其主题热度、主题之间联系密切程度以及主题之间组合方式相关。  相似文献   

14.
The levels of fasting glucose, fasting insulin, insulin resistance (IR) and the prevalence of metabolic syndrome (MS) in a sample population of bipolar disorder (BPD) patients who were newly diagnosed and psychotropically naïve were assessed and compared with an age, sex and racially matched control population. 55 BPD-I patients (15–65 years) who were non-diabetic, nonpregnant, and drug naïve for a period of at least 6 months were included in the study. Diagnosis was made using the structured clinical interview for DSM-IV axis I disorders (SCID IV). IR was assessed using homeostasis model of insulin resistance (HOMA-IR); MS was defined according to National Cholesterol Education Program-Adult Treatment Panel III (NCEP-ATP III). Data were compared with 25 healthy controls. BPD patients had significantly higher mean levels of fasting plasma insulin (13.2 ± 9.2 vs. 4.68 ± 3.1 μIU/ml, p < 0.05), postprandial plasma insulin (27.2 ± 14.5 vs. 18.1 ± 9.3 μIU/ml, p < 0.05) and a higher value of HOMA-IR (3.16 ± 2.2 vs. 1.19 ± 0.8, p < 0.05) when compared to the controls. A significantly higher proportion of patients of BPD compared to controls were manifesting levels of fasting plasma glucose, serum triglyceride and blood pressure higher than the cut off while waist circumference and serum HDL cholesterol failed to show any significant difference in the proportion. There was a significantly higher proportion of prevalence of IR between BPD cases and controls (26/55 vs. 2/25, z value 9.97, p < 0.05) while there was no significant difference in proportion of prevalence of MS between these two groups. Within BPD patients, logistic regression analysis showed that age, sex or current mood status (depressed/manic) were not significantly predictive of presence or absence of MS or increased IR.  相似文献   

15.
This paper explores the incorporation of prior knowledge into support vector machines as a means of compensating for a shortage of training data in text categorization. The prior knowledge about transformation invariance is generated by a virtual document method. The method applies a simple transformation to documents, i.e., making virtual documents by combining relevant document pairs for a topic in the training set. The virtual document thus created not only is expected to preserve the topic, but even improve the topical representation by exploiting relevant terms that are not given high importance in individual real documents. Artificially generated documents result in the change in the distribution of training data without the randomization. Experiments with support vector machines based on linear, polynomial and radial-basis function kernels showed the effectiveness on Reuters-21578 set for the topics with a small number of relevant documents. The proposed method achieved 131%, 34%, 12% improvements in micro-averaged F1 for 25, 46, and 58 topics with less than 10, 30, and 50 relevant documents in learning, respectively. The result analysis indicates that incorporating virtual documents contributes to a steady improvement on the performance.  相似文献   

16.
Latent Semantic Indexing (LSI) uses the singular value decomposition to reduce noisy dimensions and improve the performance of text retrieval systems. Preliminary results have shown modest improvements in retrieval accuracy and recall, but these have mainly explored small collections. In this paper we investigate text retrieval on a larger document collection (TREC) and focus on distribution of word norm (magnitude). Our results indicate the inadequacy of word representations in LSI space on large collections. We emphasize the query expansion interpretation of LSI and propose an LSI term normalization that achieves better performance on larger collections.  相似文献   

17.
[研究目的]针对主流话题发现模型存在数据稀疏、维度高等问题,提出了一种基于突发词对主题模型(BBTM)改进的微博热点话题发现方法(BiLSTM-HBBTM),以期在微博热点话题挖掘中获得更好的效果。[研究方法]首先,通过引入微博传播值、词项H指数和词对突发概率,从文档层面和词语层面进行特征选择,解决数据稀疏和高维度的问题。其次,通过双向长短期记忆(BiLSTM)训练词语之间的关系,结合词语的逆文档频率作为词对的先验知识,考虑了词之间的关系,解决忽略词之间关系的问题。再次,利用基于密度的方法自适应选择BBTM的最优话题数目,解决了传统的主题模型需要人工指定话题数目的问题。最后,利用真实微博数据集在热点话题发现准确度、话题质量、一致性三个方面进行验证。[研究结论]实验表明,BiLSTM-HBBTM在多种评价指标上都优于对比模型,实验结果验证了所提模型的有效性及可行性。  相似文献   

18.
The widespread availability of the Internet and the variety of Internet-based applications have resulted in a significant increase in the amount of web pages. Determining the behaviors of search engine users has become a critical step in enhancing search engine performance. Search engine user behaviors can be determined by content-based or content-ignorant algorithms. Although many content-ignorant studies have been performed to automatically identify new topics, previous results have demonstrated that spelling errors can cause significant errors in topic shift estimates. In this study, we focused on minimizing the number of wrong estimates that were based on spelling errors. We developed a new hybrid algorithm combining character n-gram and neural network methodologies, and compared the experimental results with results from previous studies. For the FAST and Excite datasets, the proposed algorithm improved topic shift estimates by 6.987% and 2.639%, respectively. Moreover, we analyzed the performance of the character n-gram method in different aspects including the comparison with Levenshtein edit-distance method. The experimental results demonstrated that the character n-gram method outperformed to the Levensthein edit distance method in terms of topic identification.  相似文献   

19.
Despite the importance of personalization in information retrieval, there is a big lack of standard datasets and methodologies for evaluating personalized information retrieval (PIR) systems, due to the costly process of producing such datasets. Subsequently, a group of evaluation frameworks (EFs) have been proposed that use surrogates of the PIR evaluation problem, instead of addressing it directly, to make PIR evaluation more feasible. We call this group of EFs, indirect evaluation frameworks. Indirect frameworks are designed to be more flexible than the classic (direct) ones and much cheaper to be employed. However, since there are many different settings and methods for PIR, e.g., social-network-based vs. profile-based PIR, and each needs some special kind of data to do the personalization based on, not all the evaluation frameworks are applicable to all the PIR methods. In this paper, we first review and categorize the frameworks that have already been introduced for evaluating PIR. We further propose a novel indirect EF based on citation networks (called PERSON), which allows repeatable, large-scale, and low-cost PIR experiments. It is also more information-rich compared to the existing EFs and can be employed in many different scenarios. The fundamental idea behind PERSON is that in each document (paper) d, the cited documents are generally related to d from the perspective of d’s author(s). To investigate the effectiveness of the proposed EF, we use a large collection of scientific papers. We conduct several sets of experiments and demonstrate that PERSON is a reliable and valid EF. In the experiments, we show that PERSON is consistent with the traditional Cranfield-based evaluation in comparing non-personalized IR methods. In addition, we show that PERSON can correctly capture the improvements made by personalization. We also demonstrate that its results are highly correlated with those of another salient EF. Our experiments on some issues about the validity of PERSON also show its validity. It is also shown that PERSON is robust w.r.t. its parameter settings.  相似文献   

20.
In this paper, a new source selection algorithm for uncooperative distributed information retrieval environments is presented. The algorithm functions by modeling each information source as an integral, using the relevance score and the intra-collection position of its sampled documents in reference to a centralized sample index and selects the collections that cover the largest area in the rank-relevance space. Based on the above novel metric, the algorithm explicitly focuses on addressing the two goals of source selection; high-recall, which is important for source recommendation applications and high-precision which is important for distributed information retrieval, aiming to produce a high-precision final merged list.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号