共查询到20条相似文献,搜索用时 15 毫秒
1.
This paper is concerned with a framework to compute the importance of webpages by using real browsing behaviors of Web users.
In contrast, many previous approaches like PageRank compute page importance through the use of the hyperlink graph of the
Web. Recently, people have realized that the hyperlink graph is incomplete and inaccurate as a data source for determining
page importance, and proposed using the real behaviors of Web users instead. In this paper, we propose a formal framework
to compute page importance from user behavior data (which covers some previous works as special cases). First, we use a stochastic
process to model the browsing behaviors of Web users. According to the analysis on hundreds of millions of real records of
user behaviors, we justify that the process is actually a continuous-time time-homogeneous Markov process, and its stationary
probability distribution can be used as the measure of page importance. Second, we propose a number of ways to estimate parameters
of the stochastic process from real data, which result in a group of algorithms for page importance computation (all referred
to as BrowseRank). Our experimental results have shown that the proposed algorithms can outperform the baseline methods such
as PageRank and TrustRank in several tasks, demonstrating the advantage of using our proposed framework. 相似文献
2.
This paper examines the way in which Taiwan is connected to on the World Wide Web in South Korea. The Web may represent a new channel for the communication among a global society's members and a reflection of international relations. Thus, it is necessary to explore the distribution of relations formed and maintained on the Web and the contents of those relations as well. This paper traced South Korean Web pages hyperlinking pages hosted in Taiwan, using a search engine. The context in which Taiwan appears in South Korean pages was also examined. Specifically, the structure of hyperlink connectivity from South Korea and Taiwan was analyzed. It was found that the hyperlink network was very sparsely connected in terms of the number of South Korean Web pages hyperlinking to the pages of the other country. The contents of hyperlink-connected information were categorized and analyzed. The most often occurring content category was ‘Computers & Internet’ in Taiwan. This suggests that South Korean Web users including organizations are more interested in computer-related products in Taiwan than any other things. The implication of this paper is to examine the state and form of international information flow from South Korea to Taiwan based on the patterns of hyperlink relations inscribed on South Korean Web pages and the type and content of information. 相似文献
3.
主要介绍了设计开发Web主题信息采集系统的一个核心算法——超链接主题预测算法。文章在已有理论的基础上,通过实验分析,发现超链接的主题主要取决于三个因素:父网页的主题相关度、锚文本的主题相关度和Web子图的链接结构特性,从而提出了基于Web页面内容和链接结构的超链接主题预测算法,系统评价结果显示该算法有很好的效果。 相似文献
4.
提出人才网页自动识别系统设计,实现对Nutch定向采集系统抓取的高校网站页面进行人才描述网页自动识别。识别过程中使用自动获取的网页的URL特征、网页Title标签特征、链接文字特征以及网页文本内容特征,使用人名词表、正面特征词表、负面特征词表对各项识别特征进行匹配以计算特征值,借助开源软件LibSVM实现基于多特征值的人才网页自动识别。 相似文献
5.
6.
Fitzpatrick RB 《Medical reference services quarterly》2007,26(4):65-74
The Institute for Scientific Information (ISI), part of Thompson Scientific, produces the Web of Science database as part of its Web of Knowledge. Recently, ISI introduced some new features, among them a new Author Finder feature, which allows users to zero in on a specific author in a very guided way. In addition, search results may be analyzed and reports created by users, both at the click of a button. This column focuses on these recently introduced features. 相似文献
7.
ABSTRACTAdding multiple sources of information in the display of Web search results may negatively affect users’ perceptual experience and information-seeking behavior. This claim was established by investigating the impact of different Web search compositions on users’ ability to extract specific information. In this article, we assumed that the quantity and order of different compositions (areas) in the Web search results page may contribute to individual’s ability to find information relevant to their search queries. An eye-tracking device was used to observe and compare the perceptual behavior of 14 users in an information-seeking task. The results showed that the use of different compositions in the display of Web results page significantly influenced users’ perceptual experience by reducing their attention to the organic results area. The quantity of these compositions was found to greatly increase the cognitive load of users when attempting to retrieve information from the organic area, which negatively affects their information-seeking performance. Our finding provides a rationale for further studies to consider the impact of quantity and order of Web page compositions on individuals’ perceptual attention and cognitive load in information-seeking task settings. 相似文献
8.
The collective feedback of the users of an Information Retrieval (IR) system has been shown to provide semantic information
that, while hard to extract using standard IR techniques, can be useful in Web mining tasks. In the last few years, several
approaches have been proposed to process the logs stored by Internet Service Providers (ISP), Intranet proxies or Web search
engines. However, the solutions proposed in the literature only partially represent the information available in the Web logs.
In this paper, we propose to use a richer data structure, which is able to preserve most of the information available in the
Web logs. This data structure consists of three groups of entities: users, documents and queries, which are connected in a
network of relations. Query refinements correspond to separate transitions between the corresponding query nodes in the graph,
while users are linked to the queries they have issued and to the documents they have selected. The classical query/document
transitions, which connect a query to the documents selected by the users’ in the returned result page, are also considered.
The resulting data structure is a complete representation of the collective search activity performed by the users of a search
engine or of an Intranet. The experimental results show that this more powerful representation can be successfully used in
several Web mining tasks like discovering semantically relevant query suggestions and Web page categorization by topic. 相似文献
9.
10.
11.
12.
丁一 《现代图书情报技术》2005,21(6):26-29
Web 信息检索(Information Retrieval)技术研究是应用文本检索研究的成果,它结合Web图论的思想,研究Web上的信息检索,是行之有效的Web知识发现的途径。传统HITS方法所获得的信息精确度相当低,而PageRank作为一通用的搜索方法,不能够应用于特定主题的信息获取。在充分分析了PageRank、HITS等现有算法和Web文档的相似度计算方法的基础上,提出了Web上查询特定主题相关信息发现的RG-HITS算法。它结合了Web超链接、网页知识表示的信息相关度以及HITS方法来搜索Web上特定主题的相关知识。 相似文献
13.
《Library & information science research》2023,45(1):101222
Searches with learning intent typically require the users to interact with the searching environment and perform knowledge acquisition features such as scan, read, and process the online content to fulfill their information needs. To capture indicators from searching behaviors that could account for the knowledge gained during a Web search, a qualitative study was performed using the Concurrent Think-Aloud protocol to observe the mechanisms of transfer and map knowledge flows during 78 search sessions. Findings indicate evidence of transfer of learning in the form of sixteen online information searching strategy indicators. This research aids the understanding of how knowledge is gained during search sessions and how to identify behaviors that could indicate that learning has occurred, which could be used to represent knowledge gain on Web search engines. In this way, it can aid search engines to become not only better tools of searching, but also tools of learning. 相似文献
14.
《Mass Communication and Society》2013,16(1):21-44
Political candidates have responded to the public's desire to use the Internet as an interactive information source by creating their own online presence. This study is a content analysis of the Web sites and blogs of the 10 Americans vying to be the Democratic candidate for the 2004 presidential election. Focusing on interactivity, data indicated front pages hyperlink to participation areas such as Donation or Volunteer sections and rarely linked to external content. Blogs used hyperlinks at a rate less than Web sites. Interactivity was encouraged through text, as 83.7% of Web sites asked voters to become more involved. Blog posts discussed issues and attacked the opponents, including President Bush. For the most part, blog posts were personal in nature and used direct address. The tactical use of advanced Web site features showed a technological progression of political campaigning and an overall increase in interactivity through technology and text. 相似文献
15.
复合型Web信息检索系统 总被引:5,自引:0,他引:5
本文首先分析了常见的三种搜索引擎 :基于内容分析的搜索引擎、基于超链分析的搜索引擎、基于反馈分析的搜索引擎的弊端 ,提出了一种能够集三种搜索引擎优点于一身的复合型Web信息检索系统 ,并详细阐述了该系统的实现方法 相似文献
16.
基于超链分析的Web资源自动发现技术 总被引:2,自引:0,他引:2
传统的Web资源自动发现是基于Web页面内容实现的。本文试图从超链分析的角度探讨Web资源的自动发现技术。超链分析技术起源于社会网络分析和科学引文分析理论,它只分析页面之间的关系,而不关心页面本身的属性。通过试验证明,单纯使用超链,根据用户提供的网页实例,我们能够自动发现与学科资源相关的网站。该技术可以有效的减少网络爬行器的无谓爬行,提高采集效率,减轻网络负担,在学科资源建设中起了重要的作用。 相似文献
17.
We investigate temporal factors in assessing the authoritativeness of web pages. We present three different metrics related
to time: age, event, and trend. These metrics measure recentness, special event occurrence, and trend in revisions, respectively.
An experimental dataset is created by crawling selected web pages for a period of several months. This data is used to compare
page rankings by human users with rankings computed by the standard PageRank algorithm (which does not include temporal factors)
and three algorithms that incorporate temporal factors, including the Time-Weighted PageRank (TWPR) algorithm introduced here. Analysis of the rankings shows that all three temporal-aware algorithms produce rankings more
like those of human users than does the PageRank algorithm. Of these, the TWPR algorithm produces rankings most similar to human users’, indicating that all three temporal factors are relevant in page
ranking. In addition, analysis of parameter values used to weight the three temporal factors reveals that age factor has the
most impact on page rankings, while trend and event factors have the second and the least impact. Proper weighting of the
three factors in TWPR algorithm provides the best ranking results. 相似文献
18.
PageRank算法的原理简介 总被引:9,自引:0,他引:9
在介绍PageRank算法基本思想、基本公式和计算实例的基础上,介绍如何利用PageR- ank算法提高网页PR的方法,最后指出PageRank算法存在的不足,并对其发展趋势进行分析。 相似文献
19.
虚拟图书馆中网页的自动分类研究 总被引:1,自引:0,他引:1
臧国全 《现代图书情报技术》2002,18(3):28-31
概括了国内外对电子文本及Web网页进行自动分类的研究和试验,论述了虚拟图书馆中对网页进行自动分类与一般搜索引擎中对网页进行自动分类的区别,提出了一种用于虚拟图书馆中对网页进行自动分类的方法,并描述了按照此方法建立的“图书馆学情报学”虚拟图书馆的自动分类系统,对分类结果进行了分析。 相似文献
20.
Significant progress has been made in information retrieval covering text semantic indexing and multilingual analysis. However,
developments in Arabic information retrieval did not follow the extraordinary growth of Arabic usage in the Web during the
ten last years. In the tasks relating to semantic analysis, it is preferable to directly deal with texts in their original
language. Studies on topic models, which provide a good way to automatically deal with semantic embedded in texts, are not
complete enough to assess the effectiveness of the approach on Arabic texts. This paper investigates several text stemming
methods for Arabic topic modeling. A new lemma-based stemmer is described and applied to newspaper articles. The Latent Dirichlet
Allocation model is used to extract latent topics from three Arabic real-world corpora. For supervised classification in the
topics space, experiments show an improvement when comparing to classification in the full words space or with root-based
stemming approach. In addition, topic modeling with lemma-based stemming allows us to discover interesting subjects in the
press articles published during the 2007–2009 period. 相似文献