首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 250 毫秒
1.
一种Web多维分析模型及应用   总被引:1,自引:0,他引:1  
朱家稷  闫宏飞 《情报学报》2004,23(5):553-560
Web上的网页正以惊人的速度增长和变化 ,给传统搜索引擎的效率和质量带来了许多新的问题和挑战。我们迫切需要一种研究方法 ,能够对搜索引擎收集来的海量网页进行有效的分析 ,以便对Web保持完整清晰的认识来指导搜索引擎进行更有效的服务。本文提出一种基于时间、空间和内容的三维Web分析模型 ,通过它可以对海量的网页数据进行多维度、多层次的分析工作 ,为我们认识Web提供一种全新的视角。在实验中我们简单地实现了该模型 ,并通过对 3批网页数据进行分析 ,得到网页变化率、网页空间分布、复制强度大的网页特点等数据 ,以及Internet作为“第四媒体”在信息传播上的一些特点。  相似文献   

2.
基于ID3分类算法的深度网络爬虫设计   总被引:1,自引:0,他引:1  
针对目前Web信息挖掘中存在的信息覆盖率较低的问题,对网络爬虫系统进行研究,提出一种针对深度网络的、基于ID3分类算法的Web页面收集方法。对Web页面的特征进行分析、处理和分类,提取包含深度网页的表单,通过自动提交这些表单来进行更深和更广的页面获取,实验表明该方法可以有效减少现有搜索引擎的盲区,改善搜索结果。  相似文献   

3.
Web多媒体网页中多媒体资源的相关文本对于描述Web多媒体资源具有重要意义,利用Web多媒体网页搜集器搜集网络中包含多媒体资源的网页,对网页进行区域分析.根据多媒体资源所在网页中的嵌入形式,设计Web多媒体资源相关文本信息提取系统,准确提取Web页面中多媒体资源的相关文本.实验结果表明,该系统提取Web多媒体资源的相关文本准确率较高,有助于提高多媒体信息检索系统的查全率与查准率.  相似文献   

4.
阮光册 《图书情报工作》2011,55(11):121-124
网络用户行为研究大多采用Web用户日志挖掘,首先介绍Web关联规则应用的传统方法,并指出传统方法中忽略了用户兴趣这一因素研究,更多的是以网页高频出现为挖掘结果进行聚类。针对这一问题,提出一种基于Web关联规则挖掘、页面内容和会话相似度相结合的研究方法,聚类出用户频繁访问的页面组,以发现网络用户行为的规律。在案例应用中,以上海某高校学生网络行为研究为例,得出相关结论。  相似文献   

5.
针对多媒体链接在网页中分布的特点,对PageRank、Shark-Search两种典型的主题搜索算法进行相关参数的改进,采用改进后的两种算法从网页内容和网页网页的角度计算多媒体网页与主题的相似度。实验结果表明,改进的Shark-Search多媒体主题搜索算法比改进后的PageRank搜索算法更能有效地提高多媒体主题搜索的效率,同时也更适合网络多媒体资源的主题搜索。  相似文献   

6.
Web网页识别算法研究   总被引:7,自引:1,他引:6  
WWW上的文本信息挖掘工作是网络信息处理领域的新课题。本文研究了两种机器学习算法———Rocchio算法和Widrow Hoff算法在Web网页识别领域中的应用 ,并对几种网页识别算法进行了比较分析  相似文献   

7.
主要介绍了我们设计的Web主题信息采集系统的一项核心工作——Web信息主题的识别,主题识别算法从构造专业性较强的主题词典着手,充分分析和考虑Web网页文本的特点,从而大大提高了主题信息采集的效率和精度,该算法同样适用于其他领域的主题信息识别。  相似文献   

8.
严海兵  崔志明 《情报学报》2007,26(3):361-365
基于关键字匹配的搜索引擎排序网页时仅仅考虑评价网页的重要性,而忽视分类;基于分类目录的搜索引擎很难动态分析Web信息。本文在分析它们不足的前提下,提出利用模糊聚类的方法对搜索引擎的检索结果进行动态分类,依据超链分析算法PageRank和Web文档隶属度相结合进行分类排序,并给出具有调节值的结合公式。实验证明,该算法能够更有效地满足用户的需要,提高检索效率。  相似文献   

9.
黎雨铭 《大观周刊》2012,(36):72-72
现今我们打开浏览器后看到的动感网页多是通过Jave在Web中的应用来实现的。1994年,Java语言以其简单安全、高性能、多线性、动态性等特点开始成为因特网中最受欢迎的开发与编程语言,经过18年发展而日趋成熟的Java语言在Web中的应用已经涵盖了教学、商务、政务乃至医疗等与我们日常生活密切相关的各方各面。本文通过对Java语言Web应用的概览,为读者提供一个对网页Java的整体印象。  相似文献   

10.
大多数传统的数据挖掘算法,通常是在二进制值的事务数据库中在单一层次上发现属性之间的关联关系即关联规则,但是大多数的数据库包含有大量量化的值。通常人们采用分区的方法处理量化值,然而这种处理方法带来分区过硬的问题。本文使用模糊的方法从Web日志中发现模糊泛化的Web网页之间的关联规则,它们能够体现出带模糊浏览时间的网页之间的关联关系。通过实例分析表明,该算法在可承受的计算时间内可对Web日志中的用户存取模式进行有效的模糊泛化关联规则的提取。  相似文献   

11.
Despite a clear improvement of search and retrieval temporal applications, current search engines are still mostly unaware of the temporal dimension. Indeed, in most cases, systems are limited to offering the user the chance to restrict the search to a particular time period or to simply rely on an explicitly specified time span. If the user is not explicit in his/her search intents (e.g., “philip seymour hoffman”) search engines may likely fail to present an overall historic perspective of the topic. In most such cases, they are limited to retrieving the most recent results. One possible solution to this shortcoming is to understand the different time periods of the query. In this context, most state-of-the-art methodologies consider any occurrence of temporal expressions in web documents and other web data as equally relevant to an implicit time sensitive query. To approach this problem in a more adequate manner, we propose in this paper the detection of relevant temporal expressions to the query. Unlike previous metadata and query log-based approaches, we show how to achieve this goal based on information extracted from document content. However, instead of simply focusing on the detection of the most obvious date we are also interested in retrieving the set of dates that are relevant to the query. Towards this goal, we define a general similarity measure that makes use of co-occurrences of words and years based on corpus statistics and a classification methodology that is able to identify the set of top relevant dates for a given implicit time sensitive query, while filtering out the non-relevant ones. Through extensive experimental evaluation, we mean to demonstrate that our approach offers promising results in the field of temporal information retrieval (T-IR), as demonstrated by the experiments conducted over several baselines on web corpora collections.  相似文献   

12.
This study proposes a temporal analysis method to utilize heterogeneous resources such as papers, patents, and web news articles in an integrated manner. We analyzed the time gap phenomena between three resources and two academic areas by conducting text mining-based content analysis. To this end, a topic modeling technique, Latent Dirichlet Allocation (LDA) was used to estimate the optimal time gaps among three resources (papers, patents, and web news articles) in two research domains. The contributions of this study are summarized as follows: firstly, we propose a new temporal analysis method to understand the content characteristics and trends of heterogeneous multiple resources in an integrated manner. We applied it to measure the exact time intervals between academic areas by understanding the time gap phenomena. The results of temporal analysis showed that the resources of the medical field had more up-to-date property than those of the computer field, and thus prompter disclosure to the public. Secondly, we adopted a power-law exponent measurement and content analysis to evaluate the proposed method. With the proposed method, we demonstrate how to analyze heterogeneous resources more precisely and comprehensively.  相似文献   

13.
Applying the framework of Construal Level Theory (CLT), this study tested the effects of an environmental ad describing the distant-future (i.e. end of the twenty-first century) vs. near-future (i.e. next summer) consequences of climate change using a sample of college students in the U.S. and South Korea. Consistent with the proposed empirical model in this study, lower perceived temporal distance of climate change generally led to higher perceived relevance of the event and more positive attitude and greater intention toward the sustainable consumption suggested in the ad (i.e. using Energy Star® qualified bulbs). However, the effects of temporal framing on the variables were moderated by the culture-specific ways in which the participants represented time and interpreted temporal information. In response to the distant-future frame, South Korean participants tended to report significantly shorter perceived temporal distance, thus presenting higher levels perceived relevance, stronger pro-environmental attitudes, and stronger behavioral intention than their U.S. counterparts. Overall, the findings of this study have meaningful implications for the external validity of CLT and for the development of effective climate change awareness campaigns targeting different audiences around the world.  相似文献   

14.
15.
应用Web2.0核心技术的图书馆信息服务创新   总被引:8,自引:0,他引:8  
Web2.0是一种以用户为中心的网络技术与服务,有其核心技术和模式,而Lib2.0是Web2.0在图书馆的应用.目前如Blog(博客)、Wiki(维基)、RSS(新闻聚合)、Folksonomy(分众分类)等Web2.0的核心技术已经被广泛使用,可将其应用于图书馆信息服务的创新,包括书目导读、信息推送、合作数字参考咨询、学科导航、远程教育等方面.  相似文献   

16.
Corporate web sites have significant roles in building a positive image with stakeholders, particularly in a host market environment with different cultural backgrounds and values. A content analysis was conducted to study the glocalization strategies of corporate web sites and depiction of cultural values of 47 international brands which were identified as having Indonesian web sites. The four types of glocalization strategies of corporate web site content differed in the depiction of cultural values on their web sites. The differences could be found in overall analysis and four of five cultural dimensions such as collectivism, uncertainty avoidance, power distance and high context communication. It integrates the theory of glocalization strategy and cultural values in the context of cyberspace, which represents a pioneering attempt in investigating the aforementioned issue.  相似文献   

17.
汕头市图书馆借用"换客"概念,建立共享图书室,为读者搭建捐书、换书的共享平台,实现书籍循环利用。共享图书室注重社会宣传,采用人性化管理,满足读者"天天换书日"需求,弥补图书馆馆藏文献资源不足问题。共享图书室应充分发挥文化志愿者作用,参与社区文化建设,推动全民阅读,确保共享图书室良性发展。  相似文献   

18.
On July 27, 2009 the United States Government Office of Management and Budget (OMB) publicized its intent to review the nine-year-old prohibition of web tracking technologies such as cookies on Federal agency web sites. OMB cited its need to continue to protect the public's privacy while visiting Federal Government web sites, while at the same time “making these web sites more user friendly, providing better customer service, and allowing for enhanced web analytics” (Federal Register, 2009, p. 37062). In this paper, we review the history of the Federal government's position on cookies, and describe exactly how the technology works and why this shift in policy toward the use of cookies is logical and necessary for the evolution of electronic-government and government 2.0 services in terms of accessibility and capability. We review two major issues with which Federal agencies must contend related to the use of cookies – privacy, and records management. It is interesting to note that, despite earlier research on the implications of privacy and records management concerning other adopted technologies such as e-mail, these issues continue to be complex and misunderstood. We discuss the implications of cookies as records for future e-Government services and for long-term records management.  相似文献   

19.
Continual changes in information technologies over the past three decades have wrought substantial changes in library services and in information-seeking behavior among the general public. Thus the necessity for libraries to utilize the internet to communicate with stakeholders is even more important for academic libraries, as the rate of internet usage among those with college degrees continues to outpace that of the general population. The online availability of a well-crafted mission statement is therefore crucial. This analysis of the web sites of 113 ARL academic libraries—an update of Kuchi's (2006) study—considers the inclusion (availability) and placement (accessibility) of mission statements on library web sites and provides insights into the academic library's use of such statements for communicating mission and purpose to different stakeholders.  相似文献   

20.
ABSTRACT

Orientation for distance students often does not equal the ideal teachable moment for learning about library resources and services. Seeking ways to provide learning objects for students to use at the point of need, academic health sciences librarians have created printable guides, Flash video demonstrations and simulations, and interactive tutorials. This electronic poster demonstrates how the learning objects were created, modified, and inserted into various delivery platforms, such as the library's web page, course management system, and CD-ROM.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号