基于规则的网络文本资源标题快速自动识别方法 Automatic Identify Title of Web Text Resource Based on Rules期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

基于规则的网络文本资源标题快速自动识别方法

引用本文：	刘建华,张智雄,谢靖,邹益民.基于规则的网络文本资源标题快速自动识别方法[J].现代图书情报技术,2011(6).

作者姓名：	刘建华张智雄谢靖邹益民

作者单位：	中国科学院国家科学图书馆;中国科学院研究生院;

基金项目：	中国科学院资助项目“科技机构自动监测服务系统”的研究成果之一

摘要：	选取网络文本资源的标题识别作为切入点,除考虑多数研究关注的文本的格式信息(如字体)、位置信息等特征外,加入对标题与网页正文内容的相关度的考虑,利用科技监测项目采集到的大量历史数据作为统计分析的基础,从候选标题的可能来源和特征方面,构建基于规则的网络文本资源标题快速识别方法,并给出该方法的时间效率和识别准确率测评结果。
关键词：	网络文本资源标题识别标题来源标题特征
Automatic Identify Title of Web Text Resource Based on Rules

Liu Jianhua Zhang Zhixiong Xie Jing Zou Yimin.Automatic Identify Title of Web Text Resource Based on Rules[J].New Technology of Library and Information Service,2011(6).

Authors:	Liu Jianhua Zhang Zhixiong Xie Jing Zou Yimin

Institution:	Liu Jianhua1 Zhang Zhixiong1 Xie Jing1 Zou Yimin1,2 1(National Science Library,Chinese Academy of Sciences,Beijing 100190,China) 2(Craduate University of Chinese Acadeny of Sciences,Beijing 100049,China)

Abstract:	As the important role of titles of Web resource for information retrieval,text cluster and so on,this paper proposes a method to identify the titles automatically and quickly based on the style information(such as font) and location information of text which are used by many other researchers.Besides,it considers the relevance between the title candidates and text content.Lastly,this paper implements the title identification component and does some experiments to show the effectiveness of this method.

Keywords:	Web text resources Title identification Title source Title feature
本文献已被 CNKI 等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏