首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于搜索引擎的中文分词评估方法
引用本文:王华栋,饶培伦.基于搜索引擎的中文分词评估方法[J].情报科学,2007,25(1):108-112.
作者姓名:王华栋  饶培伦
作者单位:清华大学工业工程系,北京,100084
摘    要:中文分词的结果是影响搜索引擎中文检索结果质量的重要因素,能否准确有效的分词对提高搜索结果的相关性和用户满意度都至关重要。本文回顾和整理了中文分词评估所依靠的理论依据,同时建立了一套完整的基于搜索引擎中文分词评估方法。这套评估方法涵盖了评估样本的提取、评估人员选取、评估标准的制定、以及评估流程的设置等各个方面。实例分析的结果表明此方法是行之有效的。在此基础上,作者进一步对实验评估的结果进行了深入讨论,并提出了提高评估效果的几条建议,包括如何考虑评估人员背景、取舍评估项目等。

关 键 词:中文分词  搜索引擎  信息检索  评估方法
文章编号:1007-7634(2007)01-0108-05
收稿时间:2006-09-25
修稿时间:2006-09-25

Chinese Word Segmentation Evaluation Methodology Based on Web Search Engines
WANG Hua-dong,RAO Pei-lun.Chinese Word Segmentation Evaluation Methodology Based on Web Search Engines[J].Information Science,2007,25(1):108-112.
Authors:WANG Hua-dong  RAO Pei-lun
Institution:Department of Industrial Engineering , Tsinghua University , Beijing 100084, China
Abstract:Chinese word segmentation is one of the determinants of result quality of Chinese search engines.Whether Chinese words are segmented effectively and correctly is vital to improving the relevance of the searching results and enhancing user satisfaction.The author first reviews the fundamental theories upon which Chinese segmentation evaluation methods are build,and then develops an integrated methodology measuring the quality of Chinese segmentation for web search engine.A set of methods and guidelines are proposed,addressing sampling issues,selection of evaluators,definition and selection of metrics,procedureof the evaluation,and etc.Then the methodology was applied in a real search engine evaluation in practice,and proved to be effective.The result of the evaluation was analyzed,and suggestions concerning evaluator screening and item rejection are provided,with the aim to get a better evaluation performance.
Keywords:chinese word segmentation  web search engine  information retrieval  evaluation methodology
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号