首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于文本内容的农业网页信息抽取和分类研究
引用本文:朱学芳,冯曦曦.基于文本内容的农业网页信息抽取和分类研究[J].情报科学,2012(7):1012-1015.
作者姓名:朱学芳  冯曦曦
作者单位:南京大学信息管理系;南京大学多媒体信息研究所
基金项目:2008年国家社科基金重点项目(08ATQ003)
摘    要:通过对农业网页的HTML结构和特征研究,叙述基于文本内容的农业网页信息抽取和分类实验研究过程。实验中利用DOM结构对农业网页信息进行信息抽取和预处理,并根据文本的内容自动计算文本类别属性,得到特征词,通过总结样本文档的特征,对遇到的新文档进行自动分类。实验结果表明,本文信息提取的时间复杂度比较小、精确度高,提高了分类的正确率。

关 键 词:文本  农业网页  信息抽取  分类

Text Oriented Information Extraction and Classification Technology for Agricultural Webs
ZHU Xue-fang,FENG Xi-xi.Text Oriented Information Extraction and Classification Technology for Agricultural Webs[J].Information Science,2012(7):1012-1015.
Authors:ZHU Xue-fang  FENG Xi-xi
Institution:1,2(1.Dept.of Information Management,Nanjing University,Nanjing 210093,China;2.Institute of Multimedia Information Processing,Nanjing University,Nanjing 210093,China)
Abstract:Through the investigation and analysis of their structures and features of HTML in the agricultural websites,the paper described the methods of the information extraction and classification for agricultural webs.The main contents included: information extraction and classification for agricultural webs based on document object model(DOM) structure;automatic calculation of text classification attribute according to its contents;obtaining feature words;and automatic classification of new documents through the summary of sample document features and The experimental results showed that the time consumption of web information extraction was lower while its exactness kept higher,with satisfactory classification rates.
Keywords:text  agricultural web  information extraction  classification
本文献已被 CNKI 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号