基于主题爬虫的本体非分类关系学习框架 Learning Non-taxonomic Relationships Based on Focused Crawler期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于主题爬虫的本体非分类关系学习框架

引用本文：	乔建忠.基于主题爬虫的本体非分类关系学习框架[J].图书情报工作,2010,54(18):120-129.

作者姓名：	乔建忠

作者单位：	中国科学院国家科学图书馆

摘要：	提出一种借助主题爬虫自动从返回的相关网页进行本体非分类关系学习的框架与方法。针对利用互联网进行本体学习的特点，所用到的主要方法是词频、共现统计和分割聚类算法KMeans，并没有采用复杂的语法结构分析和半指导聚类算法如EM、BIRCH和SOM，因此自动化程度和效率较高。学习结果将用于指导主题爬虫进行网页相关性的判断。这种非分类关系的学习质量将由主题爬虫在实际应用中的表现来客观评价。
关键词：	本体学习非分类关系主题爬虫分割聚类算法相关度
收稿时间：	2010-04-09
修稿时间：	2010-06-21
Learning Non-taxonomic Relationships Based on Focused Crawler

Qiao Jianzhong.Learning Non-taxonomic Relationships Based on Focused Crawler[J].Library and Information Service,2010,54(18):120-129.

Authors:	Qiao Jianzhong

Institution:	National Science Library, Chinese Academy of Sciences,

Abstract:	In this paper, a novel framework and methodology for learning non-taxonomic relationships based focused crawler is presented. According to the characteristics of ontology learning from the Web, the main methods used in this paper are word frequency, co-occurrence statistics and, K-Means one of partitioning clustering algorithm, without the complex syntax analysis and semi-supervised clustering algorithm such as EM, BIRCH and SOM, and therefore achieves a high degree of automation and efficiency. Study results will be used to analyze and judge the relevance of the topic for focused crawling. The quality of relations learning will be evaluated objectively by the performance in the practical application of the focused crawler.

Keywords:	ontology learning non-taxonomic relation focused crawler partitioning clustering algorithm relevance
本文献已被万方数据等数据库收录！
	点击此处可从《图书情报工作》浏览原始摘要信息
	点击此处可从《图书情报工作》下载免费的PDF全文

设为首页 | 免责声明 | 关于勤云 | 加入收藏