期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Probabilistic relevance ranking for collaborative filtering

Jun Wang Stephen Robertson Arjen P. de Vries Marcel J. T. Reinders 《Information Retrieval》2008,11(6):477-497

Collaborative filtering is concerned with making recommendations about items to users. Most formulations of the problem are specifically designed for predicting user ratings, assuming past data of explicit user ratings is available. However, in practice we may only have implicit evidence of user preference; and furthermore, a better view of the task is of generating a top-N list of items that the user is most likely to like. In this regard, we argue that collaborative filtering can be directly cast as a relevance ranking problem. We begin with the classic Probability Ranking Principle of information retrieval, proposing a probabilistic item ranking framework. In the framework, we derive two different ranking models, showing that despite their common origin, different factorizations reflect two distinctive ways to approach item ranking. For the model estimations, we limit our discussions to implicit user preference data, and adopt an approximation method introduced in the classic text retrieval model (i.e. the Okapi BM25 formula) to effectively decouple frequency counts and presence/absence counts in the preference data. Furthermore, we extend the basic formula by proposing the Bayesian inference to estimate the probability of relevance (and non-relevance), which largely alleviates the data sparsity problem. Apart from a theoretical contribution, our experiments on real data sets demonstrate that the proposed methods perform significantly better than other strong baselines.

Marcel J. T. ReindersEmail:

相似文献

2.

基于本体的科技文献检索框架与技术实现

王莉梁冰白海燕《数字图书馆论坛》2012,(7):37-44

文章从提高科技文献检索质量的视角出发,提出基于本体的科技文献检索框架,就本体构建、文献语义空间、查询请求重构、检索过程等方面进行研究,并给出关键算法。指出本检索框架与现有研究相比,主要特征包括：基于规则自动生成文献资源的语义扩展模型;构造“特征词汇-文献-概念”三层子网结构的文献信息空间;引入用户兴趣模型,强调有关用户的这些知识将对新的检索策略的产生和发展产生影响。相似文献

3.

Scale and Translation Invariant Collaborative Filtering Systems 总被引：1，自引：0，他引：1

Daniel Lemire 《Information Retrieval》2005,8(1):129-150

Collaborative filtering systems are prediction algorithms over sparse data sets of user preferences. We modify a wide range of state-of-the-art collaborative filtering systems to make them scale and translation invariant and generally improve their accuracy without increasing their computational cost. Using the EachMovie and the Jester data sets, we show that learning-free constant time scale and translation invariant schemes outperforms other learning-free constant time schemes by at least 3% and perform as well as expensive memory-based schemes (within 4%). Over the Jester data set, we show that a scale and translation invariant Eigentaste algorithm outperforms Eigentaste 2.0 by 20%. These results suggest that scale and translation invariance is a desirable property. 相似文献

4.

同步协作信息检索模型及其机制研究

徐树维齐惠颖刘兰《图书情报工作》2009,53(21):114-117

通过分析当前用户协作信息检索的相关理论和系统实践,总结其中存在的一些问题,确定用户同步协作检索中要解决的三个关键问题：检索任务的分配、群组查询历史的重用和协作感知,提出一个用户同步协作信息检索模型,并在该模型中对要解决的这三方面的问题进行阐述。相似文献

5.

Incremental Relevance Feedback in Japanese Text Retrieval

Gareth Jones Tetsuya Sakai Masahiro Kajiura Kazuo Sumita 《Information Retrieval》2000,2(4):361-384

The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using number-to-view graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones. 相似文献

6.

基于网络消费者偏好预测的推荐算法研究 总被引：1，自引：0，他引：1

刘枚莲刘同存吴伟平《图书情报工作》2012,56(4):120-125

传统推荐算法仅依据网络消费者已有偏好信息提供推荐服务,忽略其当前购物状态信息和可能的偏好变化信息。针对这一缺陷,通过分析网络消费者偏好变化特征,提出基于网络消费者偏好预测的推荐算法。该算法综合考虑网络消费者已有偏好信息及其前购物操作行为评估其对商品的偏好,并结合协同过滤思想为其提供有针对性的推荐服务。实验结果表明,基于网络消费者偏好预测的推荐算法能够较好地预测其购物过程中的偏好倾向,显著提高推荐质量和精度。相似文献

7.

Exploiting entity relationship for query expansion in enterprise search

Xitong Liu Fei Chen Hui Fang Min Wang 《Information Retrieval》2014,17(3):265-294

Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain. 相似文献

8.

智能信息检索中个性化模式的表示形式研究 总被引：3，自引：2，他引：3

田萱孟祥光刘希玉《情报学报》2004,23(1):21-26

智能信息检索中 ,个性化模式的描述和更新决定了文档过滤的效率。本文根据Huffman树的特点 ,提出基于Huffman树形式组织用户个性化模式并给出其相应的文档过滤算法。与其他他同的个性化模式过滤算法的性能比较而言 ,其具有占用空间少 ,过滤速度快的优点。相似文献

9.

基于WordNet和SUMO本体集成的自动语义检索及可视化模型

胡泽文《国家图书馆学刊》2012,21(2):23-32,91

针对语义检索在实际应用中面临的用户查询意图获取困难、潜在语义索引计算复杂、领域本体覆盖范围小、概念语义类型不丰富、自动化程度低等问题,提出基于WordNet和SUMO本体集成的自动语义检索及可视化模型。实验表明这种模型能够过滤掉大量与用户查询无关的信息,提高信息检索系统的检准率,并很好地满足用户可视化和个性化检索需求。相似文献

10.

基于本体的国史知识检索平台构建研究

王颖张智雄孙辉雷枫《图书情报工作》2015,59(16):119-128

[目的/意义]构建国史知识检索平台,提高用户获取国史知识的效率,促进国史宣传和教育。[方法/过程]提出基于本体的国史知识检索平台构建思路与总体框架,在构建国史本体知识库的基础上,采用Neo4j数据库作为RDF数据仓储,创建基于Solr的实例索引、三元组索引和词条索引,针对多种检索需求设计实现检索引擎的执行流程、检索式构造方法以及查询处理算法,并为国史知识展示设计可视化实现方式。[结果/结论]构建国史知识检索平台,提供实体检索、查询问答、关联检索、时序检索及语义资源浏览等检索与浏览服务。该平台框架及关键技术实现方案可为面向领域知识的深度检索服务提供重要参考。相似文献

11.

Identifying top relevant dates for implicit time sensitive queries

Ricardo?Campos Email author View author&#;s OrcID profile Ga?l?Dias Alípio?Mário?Jorge Célia?Nunes 《Information Retrieval》2017,20(4):363-398

Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections. 相似文献

12.

基于社会化标签系统的个性化信息推荐探讨 总被引：4，自引：0，他引：4

田莹颖《图书情报工作》2010,54(1):50-120

针对用户个人特征并向其提供准确恰当信息的个性化信息推荐研究,一直是学术界和产业界所关注的热点。结合后控词表,对用户分散的、个性化的标注进行处理,并将用户兴趣用向量表示,然后借鉴协同过滤算法的思想,寻找出相似用户集及其内部的资源集。在此基础上,采用相对匹配策略,提出一种基于社会化标签系统的个性化推荐方法。相似文献

13.

Improving search via personalized query expansion using social media

Dong Zhou Séamus Lawless Vincent Wade 《Information Retrieval》2012,15(3-4):218-242

Social tagging systems have gained increasing popularity as a method of annotating and categorizing a wide range of different web resources. Web search that utilizes social tagging data suffers from an extreme example of the vocabulary mismatch problem encountered in traditional information retrieval (IR). This is due to the personalized, unrestricted vocabulary that users choose to describe and tag each resource. Previous research has proposed the utilization of query expansion to deal with search in this rather complicated space. However, non-personalized approaches based on relevance feedback and personalized approaches based on co-occurrence statistics only showed limited improvements. This paper proposes a novel query expansion framework based on individual user profiles mined from the annotations and resources the user has marked. The underlying theory is to regularize the smoothness of word associations over a connected graph using a regularizer function on terms extracted from top-ranked documents. The intuition behind the model is the prior assumption of term consistency: the most appropriate expansion terms for a query are likely to be associated with, and influenced by terms extracted from the documents ranked highly for the initial query. The framework also simultaneously incorporates annotations and web documents through a Tag-Topic model in a latent graph. The experimental results suggest that the proposed personalized query expansion method can produce better results than both the classical non-personalized search approach and other personalized query expansion methods. Hence, the proposed approach significantly benefits personalized web search by leveraging users’ social media data. 相似文献

14.

BioSYNTHESIS: bridging the information gap

N C Broering H R Gault H Epstein 《Bulletin of the Medical Library Association》1989,77(1):19-25

BioSYNTHESIS is a prototype intelligent retrieval system under development as part of the IAIMS project at Georgetown University. The aim is to create an integrated system that can retrieve information located on disparate computer systems. The project work has been divided in two phases: BioSYNTHESIS I, development of a single menu to access various databases which reside on different computers; and BioSYNTHESIS II, development of a search component that facilitates complex searching for the user. BioSYNTHESIS II will accept a user's query and conduct a search for appropriate information in the IAIMS databases at Georgetown. For information not available at Georgetown, such as full text, it will access selected remote systems and translate the search query as appropriate for the target system. The search through various computer systems and different databases with unique storage and retrieval structures will be transparent to the user. BioSYNTHESIS I is complete and available to users. The design work for BioSYNTHESIS II is under development and will continue as a multiyear technical research effort of the proposed Georgetown IAIMS implementation project. 相似文献

15.

数字图书馆中主动信息过滤系统的构建研究 总被引：6，自引：0，他引：6

下载免费PDF全文

焦玉英王娜《中国图书馆学报》2007,33(4)

设计了一个结合使用协作过滤和基于内容过滤的主动信息过滤的实验系统。其结构框架的主要部分有:智能代理、检索服务器、用户需求文档数据库、过滤服务器、结果处理器和推送服务器。它采用机器学习的机制来预测用户新的兴趣。相似文献

16.

基于统计语言模型的信息检索演进探析

李进华周朴雄《图书情报知识》2010,(3)

将自然语言处理技术——统计语言模型引入信息检索领域产生了一系列全新的检索模型,典型包括查询似然模型、生成相关性模型、词项依赖模型、统计翻译模型、泊松分布模型以及风险最小化框架等。本文从统计学模型以及N-gram技术的角度重点解析这些信息检索模型的演进过程。最后对基于统计语言模型的信息检索模型的发展过程以及未来发展趋势和挑战进行了总结。相似文献

17.

基于遗传算法的群体推荐系统研究

朱国玮杨玲《情报学报》2009,28(6)

由于一对一定制化沟通的实现,能最好满足消费者需求的"推荐信息"日趋重要.传统网络推荐技术在支持个人决策行为的过程中非常有效,但却很难运用于群体决策过程之中.在本项研究中,我们提出了一种全新的方法来为群体成员进行商品推荐.这种方法考虑到群体决策过程中会受到群体成员间交互作用的影响,不同意见在群体中的重要性存在的差异.基于商品的协同过滤算法,通过使用GA方法用来学习群体偏好解决子群体的未知评分问题.实验的结果显示,我们提出的方法能够提供高质量的群体推荐意见,并可以广泛运用于群体推荐过程中. 相似文献

18.

基于单文档的上下文查询信息抽取*

杭月芹姚滢沈洁《现代图书情报技术》2006,1(10):30-33

提出一种结合全局分析和局部分析从单篇文档中抽取查询信息的算法。利用全局分析提取用户的查询兴趣，通过局部分析消除查询词的歧义性。实验结果表明，该方法能较全面反映用户查询的上下文信息，提高查询的相关度。相似文献

19.

Improved Query Matching Using kd-Trees: A Latent Semantic Indexing Enhancement

M.K. Hughey M.W. Berry 《Information Retrieval》2000,2(4):287-302

Efficient information searching and retrieval methods are needed to navigate the ever increasing volumes of digital information. Traditional lexical information retrieval methods can be inefficient and often return inaccurate results. To overcome problems such as polysemy and synonymy, concept-based retrieval methods have been developed. One such method is Latent Semantic Indexing (LSI), a vector-space model, which uses the singular value decomposition (SVD) of a term-by-document matrix to represent terms and documents in k-dimensional space. As with other vector-space models, LSI is an attempt to exploit the underlying semantic structure of word usage in documents. During the query matching phase of LSI, a user's query is first projected into the term-document space, and then compared to all terms and documents represented in the vector space. Using some similarity measure, the nearest (most relevant) terms and documents are identified and returned to the user. The current LSI query matching method requires that the similarity measure be computed between the query and every term and document in the vector space. In this paper, the kd-tree searching algorithm is used within a recent LSI implementation to reduce the time and computational complexity of query matching. The kd-tree data structure stores the term and document vectors in such a way that only those terms and documents that are most likely to qualify as nearest neighbors to the query will be examined and retrieved. 相似文献

20.

文献推荐系统：提高信息检索效率之途 总被引：2，自引：0，他引：2

刘婧婧张向民《图书情报工作》2007,51(12):11-11

Traditional Information Retrieval (IR) systems have limitations in improving search performance in today’s information environment. The high recall and poor precision of traditional IR systems are only as good as with the accuracy of search query, which is, however, usually difficult for the user to construct. It is also time-consuming for the user to evaluate each search result. The recommendation techniques having been developed since the early 1990s help solve the problems that traditional IR systems have. This paper explains the basic process and major elements of document recommender systems, especially the two recommendation techniques of content-based filtering and collaborative filtering. Also discussed are the evaluation issue and the problems that current document recommender systems are facing, which need to be taken into account in future system designs. Traditional Information Retrieval (IR) systems have limitations in improving search performance in today’s information environment. The high recall and poor precision of traditional IR systems are only as good as with the accuracy of search query, which is, however, usually difficult for the user to construct. It is also time-consuming for the user to evaluate each search result. The recommendation techniques having been developed since the early 1990s help solve the problems that traditional IR systems have. This paper explains the basic process and major elements of document recommender systems, especially the two recommendation techniques of content-based filtering and collaborative filtering. Also discussed are the evaluation issue and the problems that current document recommender systems are facing, which need to be taken into account in future system designs. 相似文献