首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于领域本体的中文Web文本主题特征抽取方法
引用本文:朱恒民,马静,黄卫东.基于领域本体的中文Web文本主题特征抽取方法[J].情报理论与实践,2008,31(2):286-288,285.
作者姓名:朱恒民  马静  黄卫东
作者单位:1. 南京邮电大学,经济与管理学院,江苏,南京,210003
2. 南京航空航天大学,经济与管理学院,江苏,南京,210016
基金项目:江苏省高校自然科学基金 , 国防科技应用基础研究基金
摘    要:为了快速有效地自动处理中文Web文本,提出了一种基于领域本体的主题特征抽取方法.该方法针对Web文本特点,介绍了一种领域词典的半自动化构建方法.基于领域词典切分文本,通过对词条的主题映射,采用领域本体的概念表示文本向量,从而有效地降低文本特征向量的维数,提高主题抽取的质量.考虑文本信息的不同位置与频率,计算主题特征的权值,并且基于领域本体的结构,对主题概念的权值进行调整和排序.实例验证了该方法的有效性.

关 键 词:主题抽取  领域本体  文本挖掘
收稿时间:2007-10-29
修稿时间:2007年10月29

Topic Extracting Method of Chinese Web Documents Based on Domain Ontology
Zhu Hengmin et al.Topic Extracting Method of Chinese Web Documents Based on Domain Ontology[J].Information Studies:Theory & Application,2008,31(2):286-288,285.
Authors:Zhu Hengmin
Abstract:In order to process Chinese Web documents rapidly, effectively and automatically, a topic extracting method based on domain ontology is proposed. Considering the characteristics of Web documents, this paper brings forward a semi - automation construction method of domain dictionary. Based on the domain dictionary, the words of the documents are firstly segmented. Then, by mapping the words to the concepts of domain ontology, the documents are represented by these concepts, thus the dimension of the feature vector is effectively reduced and the quality of topic extracting is improved. The weight of topic is computed according to different places and frequencies of document features, and modified based on the structure of domain ontology. An example proves that this method is effective.
Keywords:topic extracting  domain ontology  text mining
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号