Web搜索结果多层聚类方法研究 Research on Multi-level Clustering for Web Search Results期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

Web搜索结果多层聚类方法研究

引用本文：	庞观松,蒋盛益,张黎莎,区雄发,赖旭明.Web搜索结果多层聚类方法研究[J].情报学报,2011,30(5).

作者姓名：	庞观松蒋盛益张黎莎区雄发赖旭明

作者单位：	1. 广东外语外贸大学国际工商管理学院,广州,510006 2. 广东外语外贸大学信息学院,广州,510006

基金项目：	国家自然科学基金项目(60673191); 广东省自然科学基金项目(9151026005000002); 广东省高等学校自然科学研究重点项目(06Z012)

摘要：	为了便于用户浏览搜索引擎返回结果,本文提出了一种基于TFIDF新的文本相似度计算方法,并提出使用具有近似线性时间复杂度的增量聚类算法对文本进行多层聚类的策略。同时,提出了一种从多文本中提取关键词的策略:提取簇中的名词或名词短语作为候选关键词,综合考虑每个候选关键词的词频、出现位置、长度和文本长度设置加权函数来计算其权重,不需要人工干预以及语料库的协助,自动提取权重最大的候选关键词作为类别关键词。在收集的百度、ODP语料以及公开测试的实验结果表明本文提出方法的有效性。
关键词：	文本聚类多层聚类类别关键词提取加权函数
Research on Multi-level Clustering for Web Search Results

Pang Guansong,Jiang Shengyi,Zhang Lisha,Ou Xiongfa,Lai Xuming.Research on Multi-level Clustering for Web Search Results[J].Journal of the China Society for Scientific andTechnical Information,2011,30(5).

Authors:	Pang Guansong Jiang Shengyi Zhang Lisha Ou Xiongfa Lai Xuming

Institution:	Pang Guansong~1,Jiang Shengyi~2,Zhang Lisha~2,Ou Xiongfa~2 and Lai Xuming~2 (1.School of Management,Guangdong University of Foreign Studies,Guangzhou 510006,2.School of Informatics,Guangzhou 510006)

Abstract:	In order to facilitate the browse of the search results produced by search engines,this paper proposed a TFIDF-based new method to calculate the similarity of the documents and Web search results multi-level clustering by using one-pass clustering algorithm with linear time complexity.At the same time,we proposed a strategy to extract cluster keyword from multi-texts:selected noun or noun phrase as candidate cluster keywords,and took term frequency,the position of term occurring,the length of term and text ...

Keywords:	text clustering multi-level clustering extracting keyword weighting function
本文献已被 CNKI 万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏