Web概念挖掘中标引源加权方案初探 Research On the Weighting of Indexing Sources for Web Concept Mining期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Web概念挖掘中标引源加权方案初探

引用本文：	侯汉清,章成志,郑红.Web概念挖掘中标引源加权方案初探[J].情报学报,2005,24(1):87-92.

作者姓名：	侯汉清章成志郑红

作者单位：	1. 南京农业大学信息管理系,南京,210095 2. 南京大学信息管理系,南京,210093 3. 南京信息工程大学图书馆,南京,210044

摘要：	通过对随机采集的1 800篇涉及经济、心理、文学、教育4个学科类别的网页进行人工自由标引、人工打分、词频统计,并进行统计数据的分析,得出网页内容主题与网页题名、文章标题、第一段首句、第一段尾句、第二段首句、第二段尾句、首段、尾段以及HTML标记等12个标引源的关系,分析中文网页的不同部位的主题表达能力,并为之设计加权标引时的适当权值。在我们的Web文本挖掘系统中,进行加权的对比实验表明,此权重方案优于前人的方案。
关键词：	加权自动标引标引源网页主题表达能力
修稿时间：	2004年4月2日
Research On the Weighting of Indexing Sources for Web Concept Mining

Hou Hanqing.Research On the Weighting of Indexing Sources for Web Concept Mining[J].Journal of the China Society for Scientific andTechnical Information,2005,24(1):87-92.

Authors:	Hou Hanqing

Abstract:	To distinguishing the subject expression ability of different parts of text, including the web pages, we have a investigative statistics and provide the location weighting algorithm for information extraction. We use the weighting method in the web concept mining system and the result shows that the research is very significant.

Keywords:	weighting automatic indexing indexing sources web pages subject express ability
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏