首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于查询特征分析的新闻意图自动识别
引用本文:张晓娟,陆伟,雷声伟.基于查询特征分析的新闻意图自动识别[J].图书情报工作,2014,58(20):82-90.
作者姓名:张晓娟  陆伟  雷声伟
作者单位:1. 西南大学计算机与信息科学学院; 2. 武汉大学信息资源研究中心
基金项目:本文系国家自然科学基金面上项目“基于语言模型的通用实体检索建模及框架实现研究”(项目编号:71173164)和国家社会科学基金青年项目“基于情景分析的网络舆情事件应急管理动态调控机制研究”(项目编号:13CGL132)研究成果之一。
摘    要:从Sogou查询日志中选取样本查询且进行人工标注,通过对标注后新闻查询的分析,提出能用于识别新闻意图的新特征,即查询表达式特征、查询随时间分布特征以及点击结果特征。根据这3个特征,利用决策树分类器实现查询中新闻意图的自动识别,结果发现:①新闻类查询的查询目标主要集中在特定主题信息以及娱乐类信息方面,其查询主题大多为娱乐、政治、体育与经济类信息;②相对非新闻查询,新闻查询具有更可能包含实体、随时间分布波动较大、点击结果之间相似度更高的特点;③本方法对查询中新闻意图的识别效果较好,其宏平均准确率、召回率、F值分别为 0.76、0.73、0、74。

关 键 词:查询意图  新闻查询  新闻意图  查询分类  
收稿时间:2014-07-10

Automatic Identification of News Intent Based on Analyzing Query Features
Zhang Xiaojuan,Lu Wei,Lei Shengwei.Automatic Identification of News Intent Based on Analyzing Query Features[J].Library and Information Service,2014,58(20):82-90.
Authors:Zhang Xiaojuan  Lu Wei  Lei Shengwei
Institution:1. School of Computer and Information Science, Southwest University, Chongqing 400715; 2. Centre for Studies of Information Resources, Wuhan University, Wuhan 430072
Abstract:This paper selects sample queries from Sogou query log, and makes these queries labeled by humans. Based on the analysis of the labeled news queries, we propose three novel features for news intent prediction, including query expression, a query distribution over time and clicked results. Finally, we apply the decision tree method to perform the task of automatic identification of news queries. Finally, experimental results show that: (1) Goals of news query are supposed to obtain information for a particular topic or some entertainment information, and search topics of news queries tend to be entertainment, economy, politics and sports. (2)Compared with non-news queries, new queries are likely to have named entities, larger fluctuation in the query distribution over time, and higher degree of similarity among clicked results. (3) Encouraging results of news identification are achieved, and the precision, recall, F-score for the query classification are 0.76、0.73 and 0.74, respectively.
Keywords:query intent  news queries  news intent  query classification  
点击此处可从《图书情报工作》浏览原始摘要信息
点击此处可从《图书情报工作》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号