首页 | 本学科首页   官方微博 | 高级检索  
     检索      

微博用户信息采集分析系统设计与实现
引用本文:张 扬,范 岩,夏玲玲,陈俊安,王 沁.微博用户信息采集分析系统设计与实现[J].教育技术导刊,2019,18(9):125-129.
作者姓名:张 扬  范 岩  夏玲玲  陈俊安  王 沁
作者单位:江苏警官学院 计算机信息与网络安全系,江苏 南京 210031
基金项目:江苏省高等学校大学生创新创业训练计划项目(201810329027Y)
摘    要:系统运用Python语言克服新浪微博反爬虫问题,使用Scrapy框架实现了高效、稳定的微博用户信息爬虫程序,全面获取用户在微博中的基本信息,并导入Neo4j图数据库和Echarts数据可视化库进行人物关系分析和挖掘。此外,系统针对微博中存在大量“网络水军”的现状设置了过滤选项,可以有效排除“网络水军”非正常行为对分析结果的影响。系统调试结果表明,系统能够实现对特定微博下转发、评论用户信息的实时、稳定、高效采集与分析,有效帮助人们从海量数据中提取复杂的关联关系,简洁、直观地分析微博用户之间的交互关系。

关 键 词:新浪微博  网络爬虫  模拟登录  数据分析  
收稿时间:2019-03-15

Design and Implementation of Microblog User Information Acquisition and Analysis System
ZHANG Yang,FAN Yan,XIA Ling-ling,CHEN Jun-an,WANG Qin.Design and Implementation of Microblog User Information Acquisition and Analysis System[J].Introduction of Educational Technology,2019,18(9):125-129.
Authors:ZHANG Yang  FAN Yan  XIA Ling-ling  CHEN Jun-an  WANG Qin
Institution:Department of Computer Information and Cyber Security, Jiangsu Police Institute, Nanjing 210031, China
Abstract:An efficient and stable crawler system based on Scrapy for microblog user information acquisition and analysis is designed. In the system, by overcoming anti-crawler problem of Sina Weibo, it can obtain all basic profile information of microblog users. The obtained user information can be imported into Neo4j graph database and Echarts visual diagram to analyze and mine the relationship between users. Additionally, according to the current situation of a large number of Internet paid posters existed in Microblog, the system provides a filtering option, which can effectively eliminate the influence of abnormal behavior of paid posters on the analysis results. The debugging results show that the system can crawl and analyze user information for specific microblog forwarding and commenting to achieve the real-time, stable and effective performance. It can effectively help people extract complex relationships from massive data and analyze the interaction between Microblog users concisely and intuitively.
Keywords:Sina Weibo  network crawler  simulation login  data analysis  
点击此处可从《教育技术导刊》浏览原始摘要信息
点击此处可从《教育技术导刊》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号