首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
文献推荐系统:提高信息检索效率之途   总被引:2,自引:0,他引:2  
Traditional Information Retrieval (IR) systems have limitations in improving search performance in today’s information environment. The high recall and poor precision of traditional IR systems are only as good as with the accuracy of search query, which is, however, usually difficult for the user to construct. It is also time-consuming for the user to evaluate each search result. The recommendation techniques having been developed since the early 1990s help solve the problems that traditional IR systems have. This paper explains the basic process and major elements of document recommender systems, especially the two recommendation techniques of content-based filtering and collaborative filtering. Also discussed are the evaluation issue and the problems that current document recommender systems are facing, which need to be taken into account in future system designs. Traditional Information Retrieval (IR) systems have limitations in improving search performance in today’s information environment. The high recall and poor precision of traditional IR systems are only as good as with the accuracy of search query, which is, however, usually difficult for the user to construct. It is also time-consuming for the user to evaluate each search result. The recommendation techniques having been developed since the early 1990s help solve the problems that traditional IR systems have. This paper explains the basic process and major elements of document recommender systems, especially the two recommendation techniques of content-based filtering and collaborative filtering. Also discussed are the evaluation issue and the problems that current document recommender systems are facing, which need to be taken into account in future system designs.  相似文献   

2.
在分析多样性类型的基础上,重点对信息物理、二次优化、社会化网络和时间感知4种提高个性化推荐多样性的方法进行概括、比较和分析,接着总结推荐系统多样性的主要度量指标。最后,对未来有等深入研究的问题进行展望。研究指出:移动推荐系统的多样性和新颖性研究,信息物理方法应用于推荐系统领域的机理分析,推荐系统的时序多样性和计算量问题以及各种推荐算法的有效组合研究是未来需重点突破的方向。  相似文献   

3.
Collaborative filtering is a popular recommendation technique. Although researchers have focused on the accuracy of the recommendations, real applications also need efficient algorithms. An index structure can be used to store the rating matrix and compute recommendations very fast. In this paper we study how compression techniques can reduce the size of this index structure and, at the same time, speed up recommendations. We show how coding techniques commonly used in Information Retrieval can be effectively applied to collaborative filtering, reducing the matrix size up to 75 %, and almost doubling the recommendation speed. Additionally, we propose a novel identifier reassignment technique, that achieves high compression rates, reducing by 40 % the size of an already compressed matrix. It is a very simple approach based on assigning the smallest identifiers to the items and users with the highest number of ratings, and it can be efficiently computed using a two pass indexing. The usage of the proposed compression techniques can significantly reduce the storage and time costs of recommender systems, which are two important factors in many real applications.  相似文献   

4.
Information filtering is an area getting more important as we have long been flooded with too much information, where product brokering in e-commerce is a typical example. Systems which can provide personalized product recommendations to their users (often called recommender systems) have gained a lot of interest in recent years. Collaborative filtering is one of the commonly used approaches which normally requires a definition of user similarity measure. In the literature, researchers have proposed different choices for the similarity measure using different approaches, and yet there is no guarantee for optimality. In this paper, we propose the use of machine learning techniques to learn the optimal user similarity measure as well as user rating styles for enhancing recommendation acurracy. Based on a criterion function measuring the overall prediction error, several ratings transformation functions for modeling rating styles together with their learning algorithms are derived. With the help of the formulation and the optimization framework, subjective components in user ratings are removed so that the transformed ratings can then be compared. We have evaluated our proposed methods using the EachMovie dataset and succeeded in obtaining significant improvement in recommendation accuracy when compared with the standard correlation-based algorithm.  相似文献   

5.
Information Retrieval Journal - In this work, we study recent advances in context-sensitive language models for the task of query expansion. We study the behavior of existing and new approaches for...  相似文献   

6.
In this article we introduce a novel search paradigm for microblogging sites resulting from the intersection of Information Retrieval and Social Network Analysis (SNA). This approach is based on a formal model of on-line conversations and a set of ranking measures including SNA centrality metrics, time-related conversational metrics and other specific features of current microblogging sites. The ranking approach has been compared to other methods and tested on two well known social network sites (Twitter and Friendfeed) showing that the inclusion of SNA metrics in the ranking function and the usage of a model of conversation can improve the results of search tasks.  相似文献   

7.
The rapid growth of the Web has increased the difficulty of finding the information that can address the users’ information needs. A number of recommendation approaches have been developed to tackle this problem. The increase in the number of data providers has necessitated the development of multi-publisher recommender systems; systems that include more than one item/data provider. In such environments, preserving the privacy of both publishers and subscribers is a key and challenging point. In this paper, we propose a multi-publisher framework for recommender systems based on a client–server architecture, which preserves the privacy of both data providers and subscribers. We develop our framework as a content-based filtering system using the statistical language modeling framework. We also introduce AUTO, a simple yet effective threshold optimization algorithm, to find a dissemination threshold for making acceptance and rejection decisions for new published documents. We further propose a language model sketching technique to reduce the network traffic between servers and clients in the proposed framework. Extensive experiments using the TREC-9 Filtering Track and the CLEF 2008-09 INFILE Track collections indicate the effectiveness of the proposed models in both single- and multi-publisher settings.  相似文献   

8.
基于协同过滤的数字图书馆推荐系统研究   总被引:10,自引:0,他引:10  
信息推荐服务是数字图书馆的一项重要功能。该文论述了基于协同过滤的数字图书馆推荐系统的基本原理与特点、数字图书馆进行协同推荐的必要性,介绍了基于协同过滤推荐系统的主要方法和技术,并分析了目前协同过滤方法在数字图书馆推荐系统中应用的一些实例。  相似文献   

9.
[目的/意义]推荐系统已经成为电子商务网站的重要组成部分之一,为用户提供多种形式的信息推荐服务。国内以淘宝、京东和亚马逊为代表的电子商务网站的推荐系统采用不同的技术架构和多种热点推荐技术,并且越来越重视信息服务的质量。对推荐系统服务质量进行比较研究,能够进一步推动电子商务推荐系统的发展。[方法/过程]首先,从准确性、时效性、新颖性三个技术指标对比以上推荐系统的技术架构对于推荐服务质量的影响;其次,以用户体验作为信息服务质量评价的基础,对182名受访者进行热点技术的认可度调查,研究热点技术对推荐服务质量的影响;最后,对功能模块的用户体验情况进行调查和比较分析。[结果/结论]在这些研究、调查和分析的基础上,给出电子商务推荐系统使用的技术架构和热点技术,以及改进功能模块设计的对策,以进一步提升推荐系统的信息服务质量。  相似文献   

10.
Despite the increasing use of citation-based metrics for research evaluation purposes, we do not know yet which metrics best deliver on their promise to gauge the significance of a scientific paper or a patent. We assess 17 network-based metrics by their ability to identify milestone papers and patents in three large citation datasets. We find that traditional information-retrieval evaluation metrics are strongly affected by the interplay between the age distribution of the milestone items and age biases of the evaluated metrics. Outcomes of these metrics are therefore not representative of the metrics’ ranking ability. We argue in favor of a modified evaluation procedure that explicitly penalizes biased metrics and allows us to reveal metrics’ performance patterns that are consistent across the datasets. PageRank and LeaderRank turn out to be the best-performing ranking metrics when their age bias is suppressed by a simple transformation of the scores that they produce, whereas other popular metrics, including citation count, HITS and Collective Influence, produce significantly worse ranking results.  相似文献   

11.
Information Retrieval Systeme haben in den letzten Jahren nur geringe Verbesserungen in der Retrieval Performance erzielt. Wir arbeiten an neuen Ans?tzen, dem sogenannten Collaborativen Information Retrieval (CIR), die das Potential haben, starke Verbesserungen zu erreichen. CIR ist die Methode, mit der durch Ausnutzen von Informationen aus früheren Anfragen die Retrieval Peformance für die aktuelle Anfrage verbessert wird. Wir haben ein eingeschr?nktes Szenario, in dem nur alte Anfragen und dazu relevante Antwortdokumente zur Verfügung stehen. Neue Ans?tze für Methoden der Query Expansion führen unter diesen Bedingungen zu Verbesserungen der Retrieval Performance . The accuracy of ad-hoc document retrieval systems has reached a stable plateau in the last few years. We are working on so-called collaborative information retrieval (CIR) systems which have the potential to overcome the current limits. We define CIR as a task, where an information retrieval (IR) system uses information gathered from previous search processes from one or several users to improve retrieval performance for the current user searching for information. We focus on a restricted setting in CIR in which only old queries and correct answer documents to these queries are available for improving a new query. For this restricted setting we propose new approaches for query expansion procedures. We show how CIR methods can improve overall IR performance.
CR Subject Classification H.3.3  相似文献   

12.
13.
We describe the Eurospider component for Cross-Language Information Retrieval (CLIR) that has been employed for experiments at all three CLEF campaigns to date. The central aspect of our efforts is the use of combination approaches, effectively combining multiple language pairs, translation resources and translation methods into one multilingual retrieval system. We discuss the implications of building a system that allows flexible combination, give details of the various translation resources and methods, and investigate the impact of merging intermediate results generated by the individual steps. An analysis of the resulting combination system is given which also takes into account additional requirements when deploying the system as a component in an operational, commercial setting.  相似文献   

14.
为配合学校的教学整改,把文检课划分为《信息检索基础》和《信息检索与利用》两部分。为了解《信息检索基础》的改革效果,对课程的改革过程进行了跟踪调查、分析。结果表明,改革的总体效果是好的,但也存在一些问题。文章最后对进一步完善教学改革提出了一些建议。  相似文献   

15.
In this paper, we evaluate a number of machine learning techniques for the task of ranking answers to why-questions. We use TF-IDF together with a set of 36 linguistically motivated features that characterize questions and answers. We experiment with a number of machine learning techniques (among which several classifiers and regression techniques, Ranking SVM and SVM map ) in various settings. The purpose of the experiments is to assess how the different machine learning approaches can cope with our highly imbalanced binary relevance data, with and without hyperparameter tuning. We find that with all machine learning techniques, we can obtain an MRR score that is significantly above the TF-IDF baseline of 0.25 and not significantly lower than the best score of 0.35. We provide an in-depth analysis of the effect of data imbalance and hyperparameter tuning, and we relate our findings to previous research on learning to rank for Information Retrieval.  相似文献   

16.
We analyse the difference between the averaged (average of ratios) and globalised (ratio of averages) author-level aggregation approaches based on various paper-level metrics. We evaluate the aggregation variants in terms of (1) their field bias on the author-level and (2) their ranking performance based on test data that comprises researchers that have received fellowship status or won prestigious awards for their long-lasting and high-impact research contributions to their fields. We consider various direct and indirect paper-level metrics with different normalisation approaches (mean-based, percentile-based, co-citation-based) and focus on the bias and performance differences between the two aggregation variants of each metric. We execute all experiments on two publication databases which use different field categorisation schemes. The first uses author-chosen concept categories and covers the computer science literature. The second covers all disciplines and categorises papers by keywords based on their contents. In terms of bias, we find relatively little difference between the averaged and globalised variants. For mean-normalised citation counts we find no significant difference between the two approaches. However, the percentile-based metric shows less bias with the globalised approach, except for citation windows smaller than four years. On the multi-disciplinary database, PageRank has the overall least bias but shows no significant difference between the two aggregation variants. The averaged variants of most metrics have less bias for small citation windows. For larger citation windows the differences are smaller and are mostly insignificant.In terms of ranking the well-established researchers who have received accolades for their high-impact contributions, we find that the globalised variant of the percentile-based metric performs better. Again we find no significant differences between the globalised and averaged variants based on citation counts and PageRank scores.  相似文献   

17.
Information Retrieval Journal - Ranking models are the main components of information retrieval systems. Several approaches to ranking are based on traditional machine learning algorithms using a...  相似文献   

18.
Choosing a publication venue for an academic paper is a crucial step in the research process. However, in many cases, decisions are based solely on the experience of researchers, which often leads to suboptimal results. Although there exist venue recommender systems for academic papers, they recommend venues where the paper is expected to be published. In this study, we aim to recommend publication venues from a different perspective. We estimate the number of citations a paper will receive if the paper is published in each venue and recommend the venue where the paper has the most potential impact. However, there are two challenges to this task. First, a paper is published in only one venue, and thus, we cannot observe the number of citations the paper would receive if the paper were published in another venue. Secondly, the contents of a paper and the publication venue are not statistically independent; that is, there exist selection biases in choosing publication venues. In this paper, we formulate the venue recommendation problem as a treatment effect estimation problem. We use a bias correction method to estimate the potential impact of choosing a publication venue effectively and to recommend venues based on the potential impact of papers in each venue. We highlight the effectiveness of our method using paper data from computer science conferences.  相似文献   

19.
认为建立种子集引导用户评分是解决协同过滤推荐系统新用户冷启动问题的方法之一。尝试将关联度引入种子集的构建策略,提出基于多属性综合评价的种子集策略,并利用公开数据集MovieLens设计实验,模拟推荐系统的新用户环境,对比不同种子集策略的预测准确度和成功率。实验结果表明,在更符合实际推荐系统需求的少量种子集情况下,考虑种子之间的关联性可以改善推荐效果。  相似文献   

20.
We evaluate article-level metrics along two dimensions. Firstly, we analyse metrics’ ranking bias in terms of fields and time. Secondly, we evaluate their performance based on test data that consists of (1) papers that have won high-impact awards and (2) papers that have won prizes for outstanding quality. We consider different citation impact indicators and indirect ranking algorithms in combination with various normalisation approaches (mean-based, percentile-based, co-citation-based, and post hoc rescaling). We execute all experiments on two publication databases which use different field categorisation schemes (author-chosen concept categories and categories based on papers’ semantic information).In terms of bias, we find that citation counts are always less time biased but always more field biased compared to PageRank. Furthermore, rescaling paper scores by a constant number of similarly aged papers reduces time bias more effectively compared to normalising by calendar years. We also find that percentile citation scores are less field and time biased than mean-normalised citation counts.In terms of performance, we find that time-normalised metrics identify high-impact papers better shortly after their publication compared to their non-normalised variants. However, after 7 to 10 years, the non-normalised metrics perform better. A similar trend exists for the set of high-quality papers where these performance cross-over points occur after 5 to 10 years.Lastly, we also find that personalising PageRank with papers’ citation counts reduces time bias but increases field bias. Similarly, using papers’ associated journal impact factors to personalise PageRank increases its field bias. In terms of performance, PageRank should always be personalised with papers’ citation counts and time-rescaled for citation windows smaller than 7 to 10 years.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号