首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
文献检索系统排序指标研究与实践   总被引:1,自引:0,他引:1  
文章在研究PageRank,HITS算法,以及四个专业文献检索系统的排序指标的基础上,对专业文献检索系统的排序指标选择和优化方法进行了详细分析,提出了D-Rank(Document-Rank)排序方法,并详细介绍了该算法在万方数据知识服务平台中的应用情况.  相似文献   

2.
从海量文献中确定关键文献,为科研人员的研究提供便利是图书情报工作的重要目的之一,现有的文献计量方法在解决相关问题上缺乏有力的解决方案。本文以小世界研究的大量文献为例,利用HITS算法和MPA算法,从网络整体中依据不同的分析指标确定关键文献。通过研究发现,这两种算法结合使用能够迅速的发现关键文献,科研人员能够较快把握研究主题的主要内容。  相似文献   

3.
运用Citespace的信息可视化技术,对Web of Science中收录的核心期刊文献进行数据分析,遵循科学计量学中的引文分析法、共现分析法、词频分析法以及LLR算法、PageRank算法等理论,对2014-2015年被Web of Science中的SSCI库收录的27种图书馆学情报学类期刊所载的3287篇文献进行分析,绘制出研究热点、前沿的知识图谱,发现网络计量、信息需求、索引、信息检索、组织工程、名称匹配算法、网络2.0、非源项、技术接受模型、三螺旋理论等主题是近2年国际图书馆学情报学领域持续研究的热点。传统图书馆学情报学领域、管理科学领域、计算机科学领域中的知识管理、网络计量、h指数、核心活动、国际合作、接受和使用技术的统一理论、IT治理等主题将成为今后图书馆学情报学领域的研究趋势和重点。  相似文献   

4.
王建雄 《图书情报工作》2012,56(21):114-118
在传统PageRank算法的基础上进行一些优化与改进,提出一种新的主题敏感的PageRank算法,通过计算超链接与领域向量的相似度来区分超链接对网页的贡献度,从而有效抑制主题漂移;同时为PageRank算法加入时间因子来防止PageRank偏重旧网页的问题,加入站内外区分因子来防止针对PageRank算法作弊的行为.改进算法弥补了原算法的不足,提高了主题搜索的效率.  相似文献   

5.
针对多媒体链接在网页中分布的特点,对PageRank、Shark-Search两种典型的主题搜索算法进行相关参数的改进,采用改进后的两种算法从网页内容和网页网页的角度计算多媒体网页与主题的相似度。实验结果表明,改进的Shark-Search多媒体主题搜索算法比改进后的PageRank搜索算法更能有效地提高多媒体主题搜索的效率,同时也更适合网络多媒体资源的主题搜索。  相似文献   

6.
[目的/意义]随着互联网技术的快速发展,知乎平台逐渐成为一个热议社会公众话题以及分享知识、经验的载体。因此,分析知乎平台中关键用户的影响力和挖掘其中的关键意见领袖在研究社交网络信息传播途径的过程中起到非常重要的作用。[方法/过程]通过提出改进的PageRank算法和HITS算法,分别基于知乎用户社交网络、问答网络构建用户影响力挖掘模型,能够准确、客观地识别出其中的关键用户及意见领袖。[结果/结论]实验结果表明,提出的PageRank算法和HITS算法能够有效挖掘出知乎平台中具有较为突出特性的关键意见领袖,并且算法的收敛速度较快,具有可复用性和迁移性。通过对知乎平台用户数据集进行处理和有效分析,成功建立用户影响力和关键意见领袖挖掘模型;同时,在具体话题上进行验证。因此,可以推断该模型有巨大应用价值和商业化推广前景。  相似文献   

7.
基于Web2.0的理念、方法和成型应用,结合信息门户的概念,围绕以用户为中心的信息集成,提出了基于Web2.0的个人学术信息门户的设计构想,并对其信息搜索与发现、知识创建与积累、知识协作与共享、以及知识的分类与管理等核心机制进行了分析和探讨。  相似文献   

8.
首先分析了查找相关网页的一些相关算法,然后在标准的HITS算法基础上,提出了基于修正的HITS的查找相关网页算法。最后,通过实验来分析了这些算法的各自特点和不足。对基于超链分析的查找相关网页算法的研究,可以为用户提供一种新的检索和获取的信息的途径。  相似文献   

9.
"链接工厂"欺骗(Link farm spam)和重复链接严重地损坏了像HITS这样基于链接排序的算法性能.为了检测与抑制Web"链接工厂"欺骗和重复链接,本文充分利用页面之间的复制信息,尤其是利用完全超链接信息识别可疑链接目标.提出一种由页面文档和完全链接构成的二部图结构,通过对二部图进行构建与分析,搜寻共享锚文本和链接目标的Web页面,在此过程中标识链接工厂和重复链接,并通过带惩罚因子的权重邻接矩阵减小可疑链接的影响.实时实验和用户仿真测试结果显示,本文算法能显著改善传统HITS类方法的信息搜索质量.  相似文献   

10.
基于网络结构挖掘算法的引文网络研究   总被引:1,自引:0,他引:1  
本文在对网络结构挖掘的两种典型算法(HITS算法和PageRank算法)进行比较分析的基础上,将PageRank算法应用到大规模引文网络中.对由236 517篇SCI文章构成的引文网络,计算得到每一篇文献的PageRank值,并深入分析了文献的PageRank值与通常使用的引文数指标之间的关系.分析表明:PageRank值具有与引文数很强的相关性和相似的幂律分布特征,但是PageRank算法能够在高引文文献中更好的区别文献的潜在重要性,并在很大程度上削弱作者自引对文献评价客观性的影响.  相似文献   

11.
The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs. In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph, while users are linked to the queries they have issued and to the documents they have selected. The classical query/document transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered. The resulting data structure is a complete representation of the collective search activity performed by the users of a search engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic.  相似文献   

12.
在现有相关研究的基础上,对基于通用搜索引擎的深层网络表面化方法的基本原理进行分析,对表单域取值范围的确定、查询处理、查询结果的超链接设置等与深层网络表面化相关的若干关键问题进行探讨。  相似文献   

13.
严海兵  崔志明 《情报学报》2007,26(3):361-365
基于关键字匹配的搜索引擎排序网页时仅仅考虑评价网页的重要性,而忽视分类;基于分类目录的搜索引擎很难动态分析Web信息。本文在分析它们不足的前提下,提出利用模糊聚类的方法对搜索引擎的检索结果进行动态分类,依据超链分析算法PageRank和Web文档隶属度相结合进行分类排序,并给出具有调节值的结合公式。实验证明,该算法能够更有效地满足用户的需要,提高检索效率。  相似文献   

14.
15.
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.  相似文献   

16.
Measuring Search Engine Quality   总被引:12,自引:3,他引:9  
The effectiveness of twenty public search engines is evaluated using TREC-inspired methods and a set of 54 queries taken from real Web search logs. The World Wide Web is taken as the test collection and a combination of crawler and text retrieval system is evaluated. The engines are compared on a range of measures derivable from binary relevance judgments of the first seven live results returned. Statistical testing reveals a significant difference between engines and high intercorrelations between measures. Surprisingly, given the dynamic nature of the Web and the time elapsed, there is also a high correlation between results of this study and a previous study by Gordon and Pathak. For nearly all engines, there is a gradual decline in precision at increasing cutoff after some initial fluctuation. Performance of the engines as a group is found to be inferior to the group of participants in the TREC-8 Large Web task, although the best engines approach the median of those systems. Shortcomings of current Web search evaluation methodology are identified and recommendations are made for future improvements. In particular, the present study and its predecessors deal with queries which are assumed to derive from a need to find a selection of documents relevant to a topic. By contrast, real Web search reflects a range of other information need types which require different judging and different measures.  相似文献   

17.
网络上科学信息的时效性测量   总被引:3,自引:0,他引:3  
时效性是影响网上信息质量的重要因素.本文以网上可公共获取的科学信息为对象,采用层次分析法分配信息时效性各测量指标的权重,选择数学、生命科学、物理、材料科学等8个学科门类的32个主题词进行跟踪查询,抽取Google、Yahoo和Altavista搜索引擎返回的前50个页面作为测量样本.测量结果为:网络科学信息时效性的平均得分为2.6482(总体样本2814个),仅有34.90%的网页时效性得分高于平均值.不同域名中,.gov测量结果最好;在不同资源类型方面,虚拟研究社区与博客的时效性最好.然而,时效性只是网络信息的质量特征之一,并不能仅仅根据时效性判断信息的质量.总的说来,网络科学信息的时效性有待提高.本研究中提出的时效性测评框架及方法有利于帮助研究人员和公众在查询信息时对其时效性作出初步判断.  相似文献   

18.
User queries to the Web tend to have more than one interpretation due to their ambiguity and other characteristics. How to diversify the ranking results to meet users’ various potential information needs has attracted considerable attention recently. This paper is aimed at mining the subtopics of a query either indirectly from the returned results of retrieval systems or directly from the query itself to diversify the search results. For the indirect subtopic mining approach, clustering the retrieval results and summarizing the content of clusters is investigated. In addition, labeling topic categories and concept tags on each returned document is explored. For the direct subtopic mining approach, several external resources, such as Wikipedia, Open Directory Project, search query logs, and the related search services of search engines, are consulted. Furthermore, we propose a diversified retrieval model to rank documents with respect to the mined subtopics for balancing relevance and diversity. Experiments are conducted on the ClueWeb09 dataset with the topics of the TREC09 and TREC10 Web Track diversity tasks. Experimental results show that the proposed subtopic-based diversification algorithm significantly outperforms the state-of-the-art models in the TREC09 and TREC10 Web Track diversity tasks. The best performance our proposed algorithm achieves is α-nDCG@5 0.307, IA-P@5 0.121, and α#-nDCG@5 0.214 on the TREC09, as well as α-nDCG@10 0.421, IA-P@10 0.201, and α#-nDCG@10 0.311 on the TREC10. The results conclude that the subtopic mining technique with the up-to-date users’ search query logs is the most effective way to generate the subtopics of a query, and the proposed subtopic-based diversification algorithm can select the documents covering various subtopics.  相似文献   

19.
Social tagging systems have gained increasing popularity as a method of annotating and categorizing a wide range of different web resources. Web search that utilizes social tagging data suffers from an extreme example of the vocabulary mismatch problem encountered in traditional information retrieval (IR). This is due to the personalized, unrestricted vocabulary that users choose to describe and tag each resource. Previous research has proposed the utilization of query expansion to deal with search in this rather complicated space. However, non-personalized approaches based on relevance feedback and personalized approaches based on co-occurrence statistics only showed limited improvements. This paper proposes a novel query expansion framework based on individual user profiles mined from the annotations and resources the user has marked. The underlying theory is to regularize the smoothness of word associations over a connected graph using a regularizer function on terms extracted from top-ranked documents. The intuition behind the model is the prior assumption of term consistency: the most appropriate expansion terms for a query are likely to be associated with, and influenced by terms extracted from the documents ranked highly for the initial query. The framework also simultaneously incorporates annotations and web documents through a Tag-Topic model in a latent graph. The experimental results suggest that the proposed personalized query expansion method can produce better results than both the classical non-personalized search approach and other personalized query expansion methods. Hence, the proposed approach significantly benefits personalized web search by leveraging users’ social media data.  相似文献   

20.
Web检索与联机检索   总被引:7,自引:0,他引:7  
伍宪 《图书馆论坛》2001,20(1):27-29,47
介绍了部分专业检索人员对Web检索与联机检索的看法、以及一项善于联机与Web搜索引擎两者检索效果的对比笥研究,鉴于目前大量传统的收费情报可通过Web免费获取,作者着重讨论了如何在Web上免费检索各类传统情报,并指出Web免费检索传统情报对联机检索带来的冲击。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号