基于样本加权的文本聚类算法研究 Document Clustering Algorithm Based on Sample Weighting期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于样本加权的文本聚类算法研究

引用本文：	章成志,师庆辉,薛德军.基于样本加权的文本聚类算法研究[J].情报学报,2008,27(1):42-48.

作者姓名：	章成志师庆辉薛德军

作者单位：	1. 南京大学信息管理系,南京,210093 2. 中国学术期刊(光盘版)电子杂志社,北京,100084

基金项目：	国家科技支撑计划重点项目 , 江苏省研究生培养创新工程项目

摘要：	样本加权聚类算法是一种最近才引起人们注意的算法,还存在一些需要解决的问题,例如,聚类对象之间的结构信息对样本加权聚类是否有帮助,如何将结构信息自动转换为样本或对象的权重?针对该问题,本文以学术论文为聚类对象,以K-Means算法为聚类算法基础,利用论文之间的引用关系计算每篇论文的PageRank值,并将其作为权重,提出一种基于样本加权的新的文本聚类算法.实验结果表明,基于论文PageRank值加权的聚类算法能改善文本聚类效果.该算法可推广到网页的聚类中,利用网页的PageRank进行加权聚类,来改善网页的聚类效果.
关键词：	文本聚类样本加权聚类 PageRank 被引频次
修稿时间：	2006年12月11
Document Clustering Algorithm Based on Sample Weighting

Zhang Chengzhi,Shi Qinghui,Xue Dejun.Document Clustering Algorithm Based on Sample Weighting[J].Journal of the China Society for Scientific andTechnical Information,2008,27(1):42-48.

Authors:	Zhang Chengzhi Shi Qinghui Xue Dejun

Institution:	Zhang Chengzhi~1 Shi Qinghui~2 Xue Dejun~2 1.Department of Information Management,Nanjing University,Nanjing 210093,2.China Academic Journal(CD)Electronic Publishing House,Beijing 100084

Abstract:	Sample weighting clustering algorithm has been noticed only recently.There are some unsolved problems,for example,whether the structure information among the clustering objects is helpful to sample weighting clustering?How to transform structure information into the weight of samples or not?To solve these problems,a novel sample weighting clustering algorithm is presented based on K-Means algorithm.The algorithm uses academic documents as the clustering objects.The PageRank value of each document is calcula...

Keywords:	document clustering sample weighted clustering PageRank citied frequency
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏