首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Scale and Translation Invariant Collaborative Filtering Systems   总被引:1,自引:0,他引:1  
Collaborative filtering systems are prediction algorithms over sparse data sets of user preferences. We modify a wide range of state-of-the-art collaborative filtering systems to make them scale and translation invariant and generally improve their accuracy without increasing their computational cost. Using the EachMovie and the Jester data sets, we show that learning-free constant time scale and translation invariant schemes outperforms other learning-free constant time schemes by at least 3% and perform as well as expensive memory-based schemes (within 4%). Over the Jester data set, we show that a scale and translation invariant Eigentaste algorithm outperforms Eigentaste 2.0 by 20%. These results suggest that scale and translation invariance is a desirable property.  相似文献   

2.
Information Filtering in TREC-9 and TDT-3: A Comparative Analysis   总被引:2,自引:0,他引:2  
Much work on automated information filtering has been done in the TREC and TDT domains, but differences in corpora, the nature of TREC topics vs. TDT events, the constraints imposed on training and testing, and the choices of performance measures confound any meaningful comparison between these domains. We attempt to bridge the gap between them by evaluating the performance of the k-nearest-neighbor (kNN) classification system on the corpus and categories from one domain using the constraints of the other. To maximize comparability and understand the effect of the evaluation metrics specific to each domain, we optimize the performance of kNN separately for the F 1, T9P (preferred metric for TREC-9) and C trk (official metric for TDT-3) metrics. Through a thorough comparison of our within-domain and cross-domain results, our results demonstrate that the corpus used for TREC-9 is more challenging for an information filtering system than the TDT-3 corpus and strongly suggest that the TDT-3 event tracking task itself is more difficult than the TREC batch filtering task. We also show that optimizing performance in TREC-9 and TDT-3 tends to result in systems with different performance characteristics, confounding any meaningful comparison between the two domains, and that T9P and C trk both have properties that make them undesirable as general information filtering metrics.  相似文献   

3.
When speaking of information retrieval, we often mean text retrieval. But there exist many other forms of information retrieval applications. A typical example is collaborative filtering that suggests interesting items to a user by taking into account other users’ preferences or tastes. Due to the uniqueness of the problem, it has been modeled and studied differently in the past, mainly drawing from the preference prediction and machine learning view point. A few attempts have yet been made to bring back collaborative filtering to information (text) retrieval modeling and subsequently new interesting collaborative filtering techniques have been thus derived. In this paper, we show that from the algorithmic view point, there is an even closer relationship between collaborative filtering and text retrieval. Specifically, major collaborative filtering algorithms, such as the memory-based, essentially calculate the dot product between the user vector (as the query vector in text retrieval) and the item rating vector (as the document vector in text retrieval). Thus, if we properly structure user preference data and employ the target user’s ratings as query input, major text retrieval algorithms and systems can be directly used without any modification. In this regard, we propose a unified formulation under a common notational framework for memory-based collaborative filtering, and a technique to use any text retrieval weighting function with collaborative filtering preference data. Besides confirming the rationale of the framework, our preliminary experimental results have also demonstrated the effectiveness of the approach in using text retrieval models and systems to perform item ranking tasks in collaborative filtering.  相似文献   

4.
Large-scale retrieval systems are often implemented as a cascading sequence of phases—a first filtering step, in which a large set of candidate documents are extracted using a simple technique such as Boolean matching and/or static document scores; and then one or more ranking steps, in which the pool of documents retrieved by the filter is scored more precisely using dozens or perhaps hundreds of different features. The documents returned to the user are then taken from the head of the final ranked list. Here we examine methods for measuring the quality of filtering and preliminary ranking stages, and show how to use these measurements to tune the overall performance of the system. Standard top-weighted metrics used for overall system evaluation are not appropriate for assessing filtering stages, since the output is a set of documents, rather than an ordered sequence of documents. Instead, we use an approach in which a quality score is computed based on the discrepancy between filtered and full evaluation. Unlike previous approaches, our methods do not require relevance judgments, and thus can be used with virtually any query set. We show that this quality score directly correlates with actual differences in measured effectiveness when relevance judgments are available. Since the quality score does not require relevance judgments, it can be used to identify queries that perform particularly poorly for a given filter. Using these methods, we explore a wide range of filtering options using thousands of queries, categorize the relative merits of the different approaches, and identify useful parameter combinations.  相似文献   

5.
6.
信息技术的发展催化了学术信息交流系统的改变,变革了图书馆服务模式,并且科研人员的信息交流日益依赖于e-mail、邮件组、网站以及云计算等数字化工具,并通过这些工具全面组织相关信息以及各种交流。作为一种有效的非正式交流形式,collaboratory在网络学术信息交流中发挥了重要作用。本文就其缘起、过程及特点进行了分析。  相似文献   

7.
为提高多关键词查询的效率并减少多关键词查询的开销,提出一种基于语义聚类的多关键词查询算法——MKQBSC。该算法使得语义相似的节点聚为一类,节点加入、退出或节点的语义改变时,聚类将相应改变。查询请求在相邻的语义聚类之间转发,直至到达语义相似的聚类。仿真实验结果表明:与传统的基于对倒排表求交集的多关键词查询算法相比,MKQBSC算法所需的路由跳数和所产生的消息数更少。  相似文献   

8.
To prepare for a career, in which they keep up-to-date with current physical therapy procedures and health care trends, it is imperative that students become life-long learners. Four core competencies have been identified as skills to promote life-long learning: e-mail, professional electronic mailing lists (listservs), online data-base searching, and searching the World Wide Web. This paper discusses integrating the core competencies into the curriculum of a physical therapist assistant program through a collaborative effort between the physical therapist assistant program faculty and librarians.  相似文献   

9.
Many libraries are grappling with the new freedom to link to the Internet directly from a Web-based catalog. A conservative approach that locates only Web sites replicating a library's print holdings is described. Methods for locating URLs include reviewing front matter in reference materials, scouring the Internet for material produced both in print and full text on the Web, electronic mailing lists, publishers' promotional materials, conferences, and journal articles. Readers are encouraged to make links from the catalog to any original material produced in Web-readable form by the university, including dissertations and conference proceedings.  相似文献   

10.
平行语料库的规模对于统计机器翻译性能的提高具有重要作用,但是平行语料库的人工构建成本很高。针对这个问题,本文提出了一种低成本高效率的平行语料构建方法,利用枢轴语言作为桥梁,借助已有的机器翻译技术并融合主动学习方法构建目标语言对的大规模高质量平行语料库。本文通过以英语作为枢轴语言构建日汉平行语料库的实例研究,利用成熟的基于短语的统计机器翻译技术,描述了基于译文自动评测的良好译文选择方法、基于主动学习的语料选取方法、以及翻译系统的更新迭代和评价实验。实验结果表明,本文提出的方法能够快速构建日汉平行语料,并有效提高日汉翻译系统的性能。  相似文献   

11.
面向双语术语抽取这一应用目标,提出专业领域可比语料库的构建方案并进行实验论证。针对给定的主题领域分别进行中英文专业语料的采集,从中分别获取中英文关键词,根据词语共现统计获取该主题领域的其他相关关键词;以这些关键词作为查询入口,通过学术搜索引擎从网络获取候选可比语料;对可比语料进行定量评估,以剔除不符合要求的语料,最终得到特定主题领域的可比语料库。  相似文献   

12.
Keeping current is essential for both patrons and librarians in the health sciences. We receive electronic and photocopied tables of contents. We subscribe to relevant mailing lists, newspapers and magazines. We review Web sites, books and journals. Our desks are littered with current awareness items that more often get old than read. RSS or "Really Simple Syndication' is a means of organizing and simplifying current awareness efforts. Using RSS feeds from a variety of sources along with aggregator software, librarians can keep up-to-date without the clutter. This article will provide a starting point from which to take advantage of RSS and continue the process of active learning.  相似文献   

13.
网络环境下信息服务质量的评价体系   总被引:10,自引:0,他引:10  
王浩 《图书情报工作》2003,47(11):22-24
讨论基于网络环境下信息服务质量的测评方法,列出在网络信息服务中测定质量时的调查内容和指标,试图构架一个网络信息服务质量的评价体系。  相似文献   

14.
An investigation of memorable messages as guides to self‐assessment of daily behavior was conducted. Respondents were asked to keep diaries for five days. Each day participants were asked to recall one behavior that violated and one behavior that exceeded their personal expectations for themselves. After recalling the situation, participants were asked to recall the memorable messages, if any, which came to mind when self‐assessing these behaviors. This method used the self‐assessment of prior behavior as the entry point to a feedback loop. Control theory predicts that within the feedback loop behaviors are compared with internal principles that come from memorable messages. This comparison is predicted to result in either a positively or negatively valenced evaluation of the behavior if it either exceeds or violates personal standards represented as internal principles. The findings include the categories of behaviors that exceeded or violated personal expectations, the memorable messages, and the primary sources of the memorable messages that were recalled during the comparison process. In addition, comparisons were made between this research effort and a previous study that asked participants to self‐assess more extreme cases of behavior and the memorable messages associated with that process.  相似文献   

15.
针对专利文献句子偏长的特点,将统计机器翻译中的训练语料进行子句切割获取双语的子句序列,再采 用统计和规则相结合的策略来生成子句对齐,建立基于简单子句的双语语料来重新训练统计机器翻译系统,在一定程 度上改善了原有双语训练语料中的短语对齐和词对齐,可以更为深入地利用平行语料中蕴含的翻译信息,应用于专利 统计机器翻译中,在NTCIR-9的测试集上进行实验比较,获得较为满意的翻译效果。  相似文献   

16.
Distributed top-k query processing is increasingly becoming an essential functionality in a large number of emerging application classes. This paper addresses the efficient algebraic optimization of top-k queries in wide-area distributed data repositories where the index lists for the attribute values (or text terms) of a query are distributed across a number of data peers and the computational costs include network latency, bandwidth consumption, and local peer work. We use a dynamic programming approach to find the optimal execution plan using compact data synopses for selectivity estimation that is the basis for our cost model. The optimized query is executed in a hierarchical way involving a small and fixed number of communication phases. We have performed experiments on real web data that show the benefits of distributed top-k query optimization both in network resource consumption and query response time.  相似文献   

17.
电子邮件是在线参考咨询服务中广泛采用的工具,但垃圾邮件的泛滥严重干扰了咨询工作的正常进行,如何应用反垃圾邮件技术成为重要课题。本详细论述了在线参考咨询服务中应用反垃圾邮件技术的方法。  相似文献   

18.
网络信息过滤方法的比较研究   总被引:14,自引:0,他引:14  
系统地研究了网络信息过滤的主要方法,包括分级法、URL地址列表法、自动文本分析法和图像识别技术等,指出了每种方法的主要优缺点,并在此基础上探讨了网络不良信息过滤方法存在的主要问题及其发展方向。  相似文献   

19.
《Communication monographs》2012,79(2):154-168
An investigation of memorable messages as guides to behavior from a Control Theory perspective was conducted. Respondents were asked to recall behaviors that either exceeded or violated their personal expectations for themselves, then to recall the memorable messages that came to mind when self-assessing these behaviors. This method uses the self-assessment of prior behavior as the entry point to a feedback loop. Control Theory predicts that within the feedback loop behaviors should be compared with internal principles that come from memorable messages. This comparison should result in either a positively or negatively valenced evaluation of the behavior if it either exceeds or violates personal standards represented as internal principles. The findings include the categories of behaviors that exceeded or violated personal expectations, the co-participants and the site of the behaviors, the memorable messages, and the sources and the timing of the memorable messages that were recalled during the comparison process. In addition, significant relationships of association were found between the behaviors, their valence, and the memorable messages associated with the self-assessment of behaviors. Thus, it was possible to examine the comparison process of any of the seven classes of behaviors that were found in terms of the memorable messages that respondents recalled when self-assessing these behaviors.  相似文献   

20.
[目的/意义] 分析科技评价中指标区分度异常、数据分布有偏的深层次原因,认为本质上这是评价指标值与评价属性的背离现象,即评价指标值不能较好地体现评价属性的本质含义。[方法/过程] 提出一种新的降低评价指标值与评价属性背离的方法--对数中位数标准化,并以JCR2016数学期刊为例进行实证分析。[结果/结论] 研究结果表明:引文指标更容易出现评价指标值与评价属性背离问题;可以从多角度判定评价指标值与评价属性背离问题,如指标内涵分析、及格率、离散系数、中位数极大值比、集中度指数HHI等;采用对数中位数标准化可以大幅降低评价指标值与评价属性背离问题;建议评价中如出现指标值与属性背离,采用对数中位数处理后的数据进行评价。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号