首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Hadoop的微博舆情分析预警系统研究
引用本文:周建华.基于Hadoop的微博舆情分析预警系统研究[J].西安文理学院学报,2014(4):75-81.
作者姓名:周建华
作者单位:湖南警察学院信息技术系;湖南大学信息科学与工程学院;
基金项目:湖南省哲学社会科学基金项目(11YBA123);湖南警察学院科研课题(2011YB01);湖南省科技计划项目(2013GK3088);湖南省教育厅科研项目(13C281);公安部科技创新项目(2013YYCXHNST035);2014年湖南省教改项目
摘    要:海量数据给微博舆情实时监控预警带来了严峻的挑战,将Hadoop关键技术引入微博舆情分析研究领域,以探寻分布式环境下的高效率短文本数据查询与推理方法,以微博数据结构为基础,结合云计算Hadoop关键技术特性,提出了一种海量微博数据分析预警框架.HDFS为海量微博的数据提供了存储,而MapReduce为海量微博的数据提供快速运算.采用Map(映射)和Reduce(规约)规则,对微博用户关系和内容数据的大规模数据集进行并行运算,以实现并行化高效预处理、深度分析和舆情实时五级预警.为验证计算效率与Reduce任务数之间关系,对Reduce任务数进行实验,结果表明,在Map一定的情况下,随微博数据集的增大到2 GB后,多任务数Reduce执行时间相比少任务数Reduce大大缩短.

关 键 词:Hadoop  微博舆情  分析预警  HDFS  Map

On Hadoop-Based Micro-Blog Public Opinion Analysis and Pre-Warning System
ZHOU Jian-hua.On Hadoop-Based Micro-Blog Public Opinion Analysis and Pre-Warning System[J].Journal of Xi‘an University of Arts & Science:Natural Science Edition,2014(4):75-81.
Authors:ZHOU Jian-hua
Institution:ZHOU Jian-hua (Department of Information Technology, Hunan Police College, Changsha 410138, China; School of Information Science and Engineering, Hunan University, Changsha 410082, China)
Abstract:Mass data has brought serious challenge to the early warning system of micro-blog public opinion real-time monitoring. To address the issue, the key technologies of Hadoop have been introduced into the field of micro-blog public opinion researches to explore the high-effi- ciency short text querying and reasoning in the distributed computing context. Based on micro- blog data structure, with the consideration of the key technological characteristics of Hadoop, the cloud computing, we have introduced a pre-warning scheme of mass micro-blog data analy- sis. HDFS can ensure the mass data storage while MapReduce can ensure a fast calculation. Based on the Map (mapping) and Reduce (Protocol) rules, parallel computing has been done regarding the large-scale data set of micro-blog user relationship and contents to realize parallel high efficiency pretreatment, in-depth analysis and five-level real-time warning. In order to verify the relationship between computing efficiency and the number of the Reduce tasks, experiments have been done on the number of Reduce tasks. The result shows that, when Map is ensured and the micro-blog data set reaches 2 GB, the executing duration of multi-task Reduce has been greatly reduced in comparison with Reduce of fewer tasks.
Keywords:Hadoop  micro-blog opinion  analysis and pre-warning  HDFS  Map
本文献已被 CNKI 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号