首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
This research tracked web sites posting or linking to software known as DeCSS over a 26-month period coinciding with a U.S. lawsuit that found posting and linking to the DeCSS software to be illegal. Results showed a decrease in the number of web pages posting the DeCSS software, and a decrease in the number of web pages linking to DeCSS. Seven web sites retained their DeCSS posting for the entire 26-month study period. An increasing number of sites posted nonexecutable forms of DeCSS, and results show a large percentage of web sites contained political speech. The persistence of DeCSS linking and posting was surprising given the prohibition on linking and posting within the United States and given the obsolescence of DeCSS as a DVD decrypter. We suggest that DeCSS linking and posting persists primarily as a political symbol of protest.  相似文献   

2.
This research is a part of ongoing study to better understand citation analysis on the Web. It builds on Kleinberg's research (J. Kleinberg, R. Kumar, P. Raghavan, P. Rajagopalan, A. Tomkins, Invited survey at the International Conference on Combinatorics and Computing, 1999) that hyperlinks between web pages constitute a web graph structure and tries to classify different web graphs in the new coordinate space: out-degree, in-degree. The out-degree coordinate is defined as the number of outgoing web pages from a given web page. The in-degree coordinate is the number of web pages that point to a given web page. In this new coordinate space a metric is built to classify how close or far are different web graphs. Kleinberg's web algorithm (J. Kleinberg, Proceedings of the ACM-SIAM Symposium on Discrete Algorithms, 1998, pp. 668–677) on discovering “hub web pages” and “authorities web pages” is applied in this new coordinate space. Some very uncommon phenomenon has been discovered and new interesting results interpreted. This study does not look at enhancing web retrieval by adding context information. It only considers web hyperlinks as a source to analyze citations on the web. The author believes that understanding the underlying web page as a graph will help design better web algorithms, enhance retrieval and web performance, and recommends using graphs as a part of visual aid for search engine designers.  相似文献   

3.
This study presented an inverse chi-square based web content classification system that works along with an incremental update mechanism for incremental generation of pornographic blacklist. The proposed system, as indicated from the experimental results, can classify bilingual (English and Chinese) web pages at an average precision rate of 97.11%; while maintaining a favorably low false positive rate. Such satisfactory performance was obtained under a cost-effective parameter configuration used in inverse chi-square calculations. The proposed incremental update mechanism operates on the linking structure of pornographic hubs to locate newly added pornographic sites. The resulting blacklist has been empirically verified to be comparatively responsive to the growth dynamics of pornography sites than three public domain blacklists.  相似文献   

4.
This case study analyzes the Internet-based resources that a software engineer uses in his daily work. Methodologically, we studied the web browser history of the participant, classifying all the web pages he had seen over a period of 12 days into web genres. We interviewed him before and after the analysis of the web browser history. In the first interview, he spoke about his general information behavior; in the second, he commented on each web genre, explaining why and how he used them. As a result, three approaches allow us to describe the set of 23 web genres obtained: (a) the purposes they serve for the participant; (b) the role they play in the various work and search phases; (c) and the way they are used in combination with each other. Further observations concern the way the participant assesses quality of web-based resources, and his information behavior as a software engineer.  相似文献   

5.
传统Web页面是根据具体需求由专业程序员开发设计和实现,非专业人员无法着手建立个性化网站。提出了一种可视化Web页面设计系统,通过利用ExtJS技术构建基本工具模块和高级工具模块,结合模块化和对象化设计理念,实现简单易懂的网页设计系统。该系统具有类似于桌面应用软件的交互界面,操作简单。利用该系统开发网站不仅可以减少开发成本,提高Web页面的开发效率,而且能够让用户方便快捷地打造属于自己的网站,进而实现Web的个性化服务机制。  相似文献   

6.
Over the past decade, worldwide Internet usage has grown tremendously, with the most rapid growth in some emerging economies such as Latin America and the Middle East, where people speaking different languages actively seek information on the web. Global search engines may not adequately address local users’ needs while regional web portals may lack rich web content. Different from search engines, web directories organize sites and pages into intuitive hierarchical structures to facilitate browsing. However, high-quality web directories in users’ native languages often do not exist and their development requires much domain knowledge not readily available. In this research, we proposed a novel semi-automatic approach to facilitate web repository management. We applied the approach to developing web directories in the business and health-care domains for the Spanish-speaking and Arabic-speaking communities respectively. The two directories contain respectively 4735 and 5107 unique sites and pages with a maximum depth of 5 levels. Results of experiments involving 37 native speakers show that these directories outperformed existing benchmark directories in terms of browsing effectiveness and efficiency, providing strong implications for information professionals and multinational enterprise managers.  相似文献   

7.
The goal of the study presented in this article is to investigate to what extent the classification of a web page by a single genre matches the users’ perspective. The extent of agreement on a single genre label for a web page can help understand whether there is a need for a different classification scheme that overrides the single-genre labelling. My hypothesis is that a single genre label does not account for the users’ perspective. In order to test this hypothesis, I submitted a restricted number of web pages (25 web pages) to a large number of web users (135 subjects) asking them to assign only a single genre label to each of the web pages. Users could choose from a list of 21 genre labels, or select one of the two ‘escape’ options, i.e. ‘Add a label’ and ‘I don’t know’. The rationale was to observe the level of agreement on a single genre label per web page, and draw some conclusions about the appropriateness of limiting the assignment to only a single label when doing genre classification of web pages. Results show that users largely disagree on the label to be assigned to a web page.  相似文献   

8.
A fast and efficient page ranking mechanism for web crawling and retrieval remains as a challenging issue. Recently, several link based ranking algorithms like PageRank, HITS and OPIC have been proposed. In this paper, we propose a novel recursive method based on reinforcement learning which considers distance between pages as punishment, called “DistanceRank” to compute ranks of web pages. The distance is defined as the number of “average clicks” between two pages. The objective is to minimize punishment or distance so that a page with less distance to have a higher rank. Experimental results indicate that DistanceRank outperforms other ranking algorithms in page ranking and crawling scheduling. Furthermore, the complexity of DistanceRank is low. We have used University of California at Berkeley’s web for our experiments.  相似文献   

9.
随着网络的飞速发展,网页数量急剧膨胀,近几年来更是以指数级进行增长,搜索引擎面临的挑战越来越严峻,很难从海量的网页中准确快捷地找到符合用户需求的网页。网页分类是解决这个问题的有效手段之一,基于网页主题分类和基于网页体裁分类是网页分类的两大主流,二者有效地提高了搜索引擎的检索效率。网页体裁分类是指按照网页的表现形式及其用途对网页进行分类。介绍了网页体裁的定义,网页体裁分类研究常用的分类特征,并且介绍了几种常用特征筛选方法、分类模型以及分类器的评估方法,为研究者提供了对网页体裁分类的概要性了解。  相似文献   

10.
In the whole world, the internet is exercised by millions of people every day for information retrieval. Even for a small to smaller task like fixing a fan, to cook food or even to iron clothes persons opt to search the web. To fulfill the information needs of people, there are billions of web pages, each having a different degree of relevance to the topic of interest (TOI), scattered throughout the web but this huge size makes manual information retrieval impossible. The page ranking algorithm is an integral part of search engines as it arranges web pages associated with a queried TOI in order of their relevance level. It, therefore, plays an important role in regulating the search quality and user experience for information retrieval. PageRank, HITS, and SALSA are well-known page ranking algorithm based on link structure analysis of a seed set, but ranking given by them has not yet been efficient. In this paper, we propose a variant of SALSA to give sNorm(p) for the efficient ranking of web pages. Our approach relies on a p-Norm from Vector Norm family in a novel way for the ranking of web pages as Vector Norms can reduce the impact of low authority weight in hub weight calculation in an efficient way. Our study, then compares the rankings given by PageRank, HITS, SALSA, and sNorm(p) to the same pages in the same query. The effectiveness of the proposed approach over state of the art methods has been shown using performance measurement technique, Mean Reciprocal Rank (MRR), Precision, Mean Average Precision (MAP), Discounted Cumulative Gain (DCG) and Normalized DCG (NDCG). The experimentation is performed on a dataset acquired after pre-processing of the results collected from initial few pages retrieved for a query by the Google search engine. Based on the type and amount of in-hand domain expertise 30 queries are designed. The extensive evaluation and result analysis are performed using MRR, [email protected], MAP, DCG, and NDCG as the performance measuring statistical metrics. Furthermore, results are statistically verified using a significance test. Findings show that our approach outperforms state of the art methods by attaining 0.8666 as MRR value, 0.7957 as MAP value. Thus contributing to the improvement in the ranking of web pages more efficiently as compared to its counterparts.  相似文献   

11.
Abstract

This article analyzes the determinants of public engagement on the Facebook pages of municipalities. The sample consists of 170 Italian and Spanish municipalities that used Facebook in 2014. The findings show that posting a lot of information on municipal Facebook pages does not increase the engagement of citizens. Also, frequent posting of information per se does not engage public engagement. However, if the posts are published when public can pay attention to them (e.g., off days), the likelihood of public engagement increases. Furthermore, the citizens’ engagement on municipal Facebook page depends on the level of citizens’ income – there is a negative relationship between citizens’ income and the level of participation.  相似文献   

12.
以净化网页、提取网页主题内容为目标,提出一个基于网页规划布局的网页主题内容抽取算法。该算法依据原始网页的规划布局,通过构造标签树为网页分块分类,进而通过计算内容块的主题相关度,辨别网页主题,剔除不相关信息,提取网页主题内容。实验表明,算法适用于主题型网页的“去噪”及内容提取,具体应用中有较理想的表现。  相似文献   

13.
14.
本文通过对网页结构和内容特征的深入分析和识别,对噪音网页的过滤方法进行研究和实验。首先利用阈值过滤具有明显特征的噪音网页,而后建立网页特征向量,利用SVM对网页进行分类。采用采集自Web的网页数据进行实验分析,最后得出研究结论,并展望下一步工作。  相似文献   

15.
谷斌 《情报科学》2002,20(3):320-323
本文论述了当前几种开发Web数据库的主要方案,简明介绍了他们的工作原理,重点介绍了基于ASP技术的Web数据库开发方案,并在此基础上,给出了两种开发动态网页的实例。  相似文献   

16.
The World Wide Web is growing quickly and being applied to many new types of communications. As a basis for studying organizational communications, Yates and Orlikowski (1992; Orlikowski & Yates, 1994) proposed using genres. They defined genres as "typified communicative actions characterized by similar substance and form and taken in response to recurrent situations" (Yates & Orlikowski, 1992, p.299). They further suggested that communications in a new media would show both reproduction and adaptation of existing communicative genres as well as the emergence of new genres. We studied these phenomena on the World Wide Web by examining 1000 randomly selected Web pages and categorizing the type of genre represented. Although many pages recreated genres familiar from traditional media, we also saw examples of genres being adapted to take advantage of the linking and interactivity of the new medium and novel genres emerging to fit the unique communicative needs of the audience. We suggest that Web-site designers consider the genres that are appropriate for their situation and attempt to reproduce or adapt familiar genres.  相似文献   

17.
随着互联网的快速发展,恶意网页所造成的危害也越来越大。对典型恶意网页进行了分析与分类,通过对现有的恶意网页检测技术的比较分类,分析了各种检测技术的优缺点。  相似文献   

18.
基于遗传算法的主题信息搜索系统研究   总被引:1,自引:0,他引:1  
罗长寿  康丽  刘国靖 《现代情报》2009,29(3):176-178
针对网络信息资源“迷向”与“过载”的现象,本文通过对遗传算法的分析应用,构建了由基于遗传算法的主题爬虫、信息处理和查询服务三部分组成的主题信息搜索系统。实验结果表明,应用该系统可以获取与主题相关度高的网页信息。  相似文献   

19.
爬虫是搜索引擎的重要组成部分,它决定了搜索引擎的性能,而Larbin正是一种高效的网络爬虫。首先分析了Larbin的设计结构,再由对其核心的算法Bloom Filter进行了研究,并对其提出了改进。最后是关于Larbin优化的实现。  相似文献   

20.
针对主题搜索引擎反馈信息主题相关度低的问题,提出了将遗传算法与基于内容的空间向量模型相结合的搜索策略。利用空间向量模型确定网页与主题的相关度,并将遗传算法应用于相关度判别,提高主题信息搜索的准确率和查全率。在Heritrix框架基础上,利用Eclipse3.3实现了相应功能。实验结果表明,搜索策略改进后的系统抓取主题页面所占比例与原系统相比提高了约30%。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号