首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We analyse the difference between the averaged (average of ratios) and globalised (ratio of averages) author-level aggregation approaches based on various paper-level metrics. We evaluate the aggregation variants in terms of (1) their field bias on the author-level and (2) their ranking performance based on test data that comprises researchers that have received fellowship status or won prestigious awards for their long-lasting and high-impact research contributions to their fields. We consider various direct and indirect paper-level metrics with different normalisation approaches (mean-based, percentile-based, co-citation-based) and focus on the bias and performance differences between the two aggregation variants of each metric. We execute all experiments on two publication databases which use different field categorisation schemes. The first uses author-chosen concept categories and covers the computer science literature. The second covers all disciplines and categorises papers by keywords based on their contents. In terms of bias, we find relatively little difference between the averaged and globalised variants. For mean-normalised citation counts we find no significant difference between the two approaches. However, the percentile-based metric shows less bias with the globalised approach, except for citation windows smaller than four years. On the multi-disciplinary database, PageRank has the overall least bias but shows no significant difference between the two aggregation variants. The averaged variants of most metrics have less bias for small citation windows. For larger citation windows the differences are smaller and are mostly insignificant.In terms of ranking the well-established researchers who have received accolades for their high-impact contributions, we find that the globalised variant of the percentile-based metric performs better. Again we find no significant differences between the globalised and averaged variants based on citation counts and PageRank scores.  相似文献   

2.
In the field of scientometrics, impact indicators and ranking algorithms are frequently evaluated using unlabelled test data comprising relevant entities (e.g., papers, authors, or institutions) that are considered important. The rationale is that the higher some algorithm ranks these entities, the better its performance. To compute a performance score for an algorithm, an evaluation measure is required to translate the rank distribution of the relevant entities into a single-value performance score. Until recently, it was simply assumed that taking the average rank (of the relevant entities) is an appropriate evaluation measure when comparing ranking algorithms or fine-tuning algorithm parameters.With this paper we propose a framework for evaluating the evaluation measures themselves. Using this framework the following questions can now be answered: (1) which evaluation measure should be chosen for an experiment, and (2) given an evaluation measure and corresponding performance scores for the algorithms under investigation, how significant are the observed performance differences?Using two publication databases and four test data sets we demonstrate the functionality of the framework and analyse the stability and discriminative power of the most common information retrieval evaluation measures. We find that there is no clear winner and that the performance of the evaluation measures is highly dependent on the underlying data. Our results show that the average rank is indeed an adequate and stable measure. However, we also show that relatively large performance differences are required to confidently determine if one ranking algorithm is significantly superior to another. Lastly, we list alternative measures that also yield stable results and highlight measures that should not be used in this context.  相似文献   

3.
A number of online marketplaces enable customers to buy or sell used products, which raises the need for ranking tools to help them find desirable items among a huge pool of choices. To the best of our knowledge, no prior work in the literature has investigated the task of used product ranking which has its unique characteristics compared with regular product ranking. While there exist a few ranking metrics (e.g., price, conversion probability) that measure the “goodness” of a product, they do not consider the time factor, which is crucial in used product trading due to the fact that each used product is often unique while new products are usually abundant in supply or quantity. In this paper, we introduce a novel time-aware metric—“sellability”, which is defined as the time duration for a used item to be traded, to quantify the value of it. In order to estimate the “sellability” values for newly generated used products and to present users with a ranked list of the most relevant results, we propose a combined Poisson regression and listwise ranking model. The model has a good property in fitting the distribution of “sellability”. In addition, the model is designed to optimize loss functions for regression and ranking simultaneously, which is different from previous approaches that are conventionally learned with a single cost function, i.e., regression or ranking. We evaluate our approach in the domain of used vehicles. Experimental results show that the proposed model can improve both regression and ranking performance compared with non-machine learning and machine learning baselines.  相似文献   

4.
Large-scale retrieval systems are often implemented as a cascading sequence of phases—a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of different features. The documents returned to the user are then taken from the head of the final ranked list. Here we examine methods for measuring the quality of filtering and preliminary ranking stages, and show how to use these measurements to tune the overall performance of the system. Standard top-weighted metrics used for overall system evaluation are not appropriate for assessing filtering stages, since the output is a set of documents, rather than an ordered sequence of documents. Instead, we use an approach in which a quality score is computed based on the discrepancy between filtered and full evaluation. Unlike previous approaches, our methods do not require relevance judgments, and thus can be used with virtually any query set. We show that this quality score directly correlates with actual differences in measured effectiveness when relevance judgments are available. Since the quality score does not require relevance judgments, it can be used to identify queries that perform particularly poorly for a given filter. Using these methods, we explore a wide range of filtering options using thousands of queries, categorize the relative merits of the different approaches, and identify useful parameter combinations.  相似文献   

5.
Cluster-based and passage-based document retrieval paradigms were shown to be effective. While the former are based on utilizing query-related corpus context manifested in clusters of similar documents, the latter address the fact that a document can be relevant even if only a very small part of it contains query-pertaining information. Hence, cluster-based approaches could be viewed as based on “expanding” the document representation, while passage-based approaches can be thought of as utilizing a “contracted” document representation. We present a study of the relative benefits of using each of these two approaches, and of the potential merits of their integration. To that end, we devise two methods that integrate whole-document-based, cluster-based and passage-based information. The methods are applied for the re-ranking task, that is, re-ordering documents in an initially retrieved list so as to improve precision at the very top ranks. Extensive empirical evaluation attests to the potential merits of integrating these information types. Specifically, the resultant performance substantially transcends that of the initial ranking; and, is often better than that of a state-of-the-art pseudo-feedback-based query expansion approach.  相似文献   

6.
7.
Peer-to-peer (P2P) networks integrate autonomous computing resources without requiring a central coordinating authority, which makes them a potentially robust and scalable model for providing federated search capability to large-scale networks of text-based digital libraries. However, peer-to-peer networks have so far provided very limited support for full-text federated search with relevance-based document ranking. This paper provides solutions to full-text federated search of text-based digital libraries in hierarchical peer-to-peer networks. Existing approaches to full-text search are adapted and new methods are developed for the problems of resource representation, resource selection, and result merging according to the unique characteristics of hierarchical peer-to-peer networks. Experimental results demonstrate that the proposed approaches offer a better combination of accuracy and efficiency than more common alternatives for federated search of text-based digital libraries in peer-to-peer networks.  相似文献   

8.
Credibility-inspired ranking for blog post retrieval   总被引:1,自引:0,他引:1  
Credibility of information refers to its believability or the believability of its sources. We explore the impact of credibility-inspired indicators on the task of blog post retrieval, following the intuition that more credible blog posts are preferred by searchers. Based on a previously introduced credibility framework for blogs, we define several credibility indicators, and divide them into post-level (e.g., spelling, timeliness, document length) and blog-level (e.g., regularity, expertise, comments) indicators. The retrieval task at hand is precision-oriented, and we hypothesize that the use of credibility-inspired indicators will positively impact precision. We propose to use ideas from the credibility framework in a reranking approach to the blog post retrieval problem: We introduce two simple ways of reranking the top n of an initial run. The first approach, Credibility-inspired reranking, simply reranks the top n of a baseline based on the credibility-inspired score. The second approach, Combined reranking, multiplies the credibility-inspired score of the top n results by their retrieval score, and reranks based on this score. Results show that Credibility-inspired reranking leads to larger improvements over the baseline than Combined reranking, but both approaches are capable of improving over an already strong baseline. For Credibility-inspired reranking the best performance is achieved using a combination of all post-level indicators. Combined reranking works best using the post-level indicators combined with comments and pronouns. The blog-level indicators expertise, regularity, and coherence do not contribute positively to the performance, although analysis shows that they can be useful for certain topics. Additional analysis shows that a relative small value of n (15–25) leads to the best results, and that posts that move up the ranking due to the integration of reranking based on credibility-inspired indicators do indeed appear to be more credible than the ones that go down.  相似文献   

9.
We evaluate author impact indicators and ranking algorithms on two publication databases using large test data sets of well-established researchers. The test data consists of (1) ACM fellowship and (2) various life-time achievement awards. We also evaluate different approaches of dividing credit of papers among co-authors and analyse the impact of self-citations. Furthermore, we evaluate different graph normalisation approaches for when PageRank is computed on author citation graphs.We find that PageRank outperforms citation counts in identifying well-established researchers. This holds true when PageRank is computed on author citation graphs but also when PageRank is computed on paper graphs and paper scores are divided among co-authors. In general, the best results are obtained when co-authors receive an equal share of a paper's score, independent of which impact indicator is used to compute paper scores. The results also show that removing author self-citations improves the results of most ranking metrics. Lastly, we find that it is more important to personalise the PageRank algorithm appropriately on the paper level than deciding whether to include or exclude self-citations. However, on the author level, we find that author graph normalisation is more important than personalisation.  相似文献   

10.
11.
Multilingual retrieval (querying of multiple document collections each in a different language) can be achieved by combining several individual techniques which enhance retrieval: machine translation to cross the language barrier, relevance feedback to add words to the initial query, decompounding for languages with complex term structure, and data fusion to combine monolingual retrieval results from different languages. Using the CLEF 2001 and CLEF 2002 topics and document collections, this paper evaluates these techniques within the context of a monolingual document ranking formula based upon logistic regression. Each individual technique yields improved performance over runs which do not utilize that technique. Moreover the techniques are complementary, in that combining the best techniques outperforms individual technique performance. An approximate but fast document translation using bilingual wordlists created from machine translation systems is presented and evaluated. The fast document translation is as effective as query translation in multilingual retrieval. Furthermore, when fast document translation is combined with query translation in multilingual retrieval, the performance is significantly better than that of query translation or fast document translation.  相似文献   

12.
This paper describes a comparative case study of the capture and selection practices used to populate electronic depositories of born-digital state government publications. The three case sites illustrate differences in collection building approaches, technological infrastructures, and statutory contexts. The findings reveal two basic modes of selection practices—active selection and passive selection, and three selection models based on the loci of selection control—library selection, liaison selection, and creator selection. Also, the findings suggest the power of defining and selecting government publications for state depositories is shifting from government agencies to state libraries in the active selection model. The authors argue for the need to attend to Web publications in non-traditional formats (e.g., an interactive HTML document) and to include common publications produced for lay citizens (e.g., brochures, fact sheets, FAQs, etc.) in the permanent collections in order to fully document government activities for the historical record.  相似文献   

13.
A standard approach to Information Retrieval (IR) is to model text as a bag of words. Alternatively, text can be modelled as a graph, whose vertices represent words, and whose edges represent relations between the words, defined on the basis of any meaningful statistical or linguistic relation. Given such a text graph, graph theoretic computations can be applied to measure various properties of the graph, and hence of the text. This work explores the usefulness of such graph-based text representations for IR. Specifically, we propose a principled graph-theoretic approach of (1) computing term weights and (2) integrating discourse aspects into retrieval. Given a text graph, whose vertices denote terms linked by co-occurrence and grammatical modification, we use graph ranking computations (e.g. PageRank Page et al. in The pagerank citation ranking: Bringing order to the Web. Technical report, Stanford Digital Library Technologies Project, 1998) to derive weights for each vertex, i.e. term weights, which we use to rank documents against queries. We reason that our graph-based term weights do not necessarily need to be normalised by document length (unlike existing term weights) because they are already scaled by their graph-ranking computation. This is a departure from existing IR ranking functions, and we experimentally show that it performs comparably to a tuned ranking baseline, such as BM25 (Robertson et al. in NIST Special Publication 500-236: TREC-4, 1995). In addition, we integrate into ranking graph properties, such as the average path length, or clustering coefficient, which represent different aspects of the topology of the graph, and by extension of the document represented as a graph. Integrating such properties into ranking allows us to consider issues such as discourse coherence, flow and density during retrieval. We experimentally show that this type of ranking performs comparably to BM25, and can even outperform it, across different TREC (Voorhees and Harman in TREC: Experiment and evaluation in information retrieval, MIT Press, 2005) datasets and evaluation measures.  相似文献   

14.
We evaluate article-level metrics along two dimensions. Firstly, we analyse metrics’ ranking bias in terms of fields and time. Secondly, we evaluate their performance based on test data that consists of (1) papers that have won high-impact awards and (2) papers that have won prizes for outstanding quality. We consider different citation impact indicators and indirect ranking algorithms in combination with various normalisation approaches (mean-based, percentile-based, co-citation-based, and post hoc rescaling). We execute all experiments on two publication databases which use different field categorisation schemes (author-chosen concept categories and categories based on papers’ semantic information).In terms of bias, we find that citation counts are always less time biased but always more field biased compared to PageRank. Furthermore, rescaling paper scores by a constant number of similarly aged papers reduces time bias more effectively compared to normalising by calendar years. We also find that percentile citation scores are less field and time biased than mean-normalised citation counts.In terms of performance, we find that time-normalised metrics identify high-impact papers better shortly after their publication compared to their non-normalised variants. However, after 7 to 10 years, the non-normalised metrics perform better. A similar trend exists for the set of high-quality papers where these performance cross-over points occur after 5 to 10 years.Lastly, we also find that personalising PageRank with papers’ citation counts reduces time bias but increases field bias. Similarly, using papers’ associated journal impact factors to personalise PageRank increases its field bias. In terms of performance, PageRank should always be personalised with papers’ citation counts and time-rescaled for citation windows smaller than 7 to 10 years.  相似文献   

15.
The increasingly growing popularity of the collaboration among researchers and the increasing information overload in big scholarly data make it imperative to develop a collaborator recommendation system for researchers to find potential partners. Existing works always study this task as a link prediction problem in a homogeneous network with a single object type (i.e., author) and a single link type (i.e., co-authorship). However, a real-world academic social network often involves several object types, e.g., papers, terms, and venues, as well as multiple relationships among different objects. This paper proposes a RWR-CR (standing for random walk with restart-based collaborator recommendation) algorithm in a heterogeneous bibliographic network towards this problem. First, we construct a heterogeneous network with multiple types of nodes and links with a simplified network structure by removing the citing paper nodes. Then, two importance measures are used to weight edges in the network, which will bias a random walker’s behaviors. Finally, we employ a random walk with restart to retrieve relevant authors and output an ordered recommendation list in terms of ranking scores. Experimental results on DBLP and hep-th datasets demonstrate the effectiveness of our methodology and its promising performance in collaborator prediction.  相似文献   

16.
We use data on economic, management and political science journals to produce quantitative estimates of (in)consistency of the evaluations based on six popular bibliometric indicators (impact factor, 5-year impact factor, immediacy index, article influence score, SNIP and SJR). We advocate a new approach to the aggregation of journal rankings. Since the rank aggregation is a multicriteria decision problem, ranking methods from social choice theory may solve it. We apply either a direct ranking method based on the majority rule (the Copeland rule, the Markovian method) or a sorting procedure based on a tournament solution, such as the uncovered set and the minimal externally stable set. We demonstrate that the aggregate rankings reduce the number of contradictions and represent the set of the single-indicator-based rankings better than any of the six rankings themselves.  相似文献   

17.
User interactions in Web search, in particular, clicks, provide valuable hints on document relevance; but the signals are very noisy. In order to better understand user click behaviors and to infer the implied relevance, various click models have been proposed, each relying on some hypotheses and involving different hidden events (e.g. examination). In almost all the existing click models, it is assumed that clicks are the only observable evidence and the examinations of documents are deduced from it. However, with an increasing number of embedded heterogeneous components (e.g. verticals) on Search Engine Result Pages, click information is not sufficient to draw a complete picture of process of user examination, especially in federated search scenario. In practice, we can also collect mouse movement information, which has proven to have a strong correlation with examination. In this paper, we propose to incorporate mouse movement information into existing click models to enhance the estimation of examination. The enhanced click models are shown to have a better ability to predict both user clicks and document relevance, than the original models. The collection of mouse movement information has been implemented in a commercial search engine, showing the feasibility of the approach in practice.  相似文献   

18.
The emergence of CD-ROM (compact disc/read-only memory) versions of the MEDLINE database requires experienced MEDLINE searchers to examine assumptions about searching MEDLINE, since some expectations may not be fulfilled by this new technology. When applied to a particular CD-ROM MEDLINE product, the evaluation procedure involves testing assumptions concerning database contents; mechanics of searching; display, print, and download capabilities; and user-friendly features. The extent to which a CD-ROM product preserves and exploits important MEDLINE strengths should be assessed, e.g., the MeSH controlled vocabulary, the designation of major and minor MeSH emphasis, and the use of subheadings. Search software characteristics that affect ease of searching and quality of results also need to be examined, e.g., the ability to truncate search terms and the order of precedence in which Boolean operators are evaluated. A checklist to assist in the evaluation process is presented, including search examples for use in testing search functions.  相似文献   

19.
The publication indicator of the Finnish research funding system is based on a manual ranking of scholarly publication channels. These ranks, which represent the evaluated quality of the channels, are continuously kept up to date and thoroughly reevaluated every four years by groups of nominated scholars belonging to different disciplinary panels. This expert-based decision-making process is informed by available citation-based metrics and other relevant metadata characterizing the publication channels. The purpose of this paper is to introduce various approaches that can explain the basis and evolution of the quality of publication channels, i.e., ranks. This is important for the academic community, whose research work is being governed using the system. Data-based models that, with sufficient accuracy, explain the level of or changes in ranks provide assistance to the panels in their multi-objective decision making, thus suggesting and supporting the need to use more cost-effective, automated ranking mechanisms. The analysis relies on novel advances in machine learning systems for classification and predictive analysis, with special emphasis on local and global feature importance techniques.  相似文献   

20.
This paper proposes a novel ranking approach, cost-sensitive ordinal classification via regression (COCR), which respects the discrete nature of ordinal ranks in real-world data sets. In particular, COCR applies a theoretically sound method for reducing an ordinal classification to binary and solves the binary classification sub-tasks with point-wise regression. Furthermore, COCR allows us to specify mis-ranking costs to further improve the ranking performance; this ability is exploited by deriving a corresponding cost for a popular ranking criterion, expected reciprocal rank (ERR). The resulting ERR-tuned COCR boosts the benefits of the efficiency of using point-wise regression and the accuracy of top-rank prediction from the ERR criterion. Evaluations on four large-scale benchmark data sets, i.e., "Yahoo! Learning to Rank Challenge" and "Microsoft Learning to Rank,” verify the significant superiority of COCR over commonly used regression approaches.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号