首页 | 本学科首页   官方微博 | 高级检索  
     检索      

基于Hadoop的文本分析平台实践
引用本文:张吉亮,尹兰.基于Hadoop的文本分析平台实践[J].安顺学院学报,2020(1):132-136.
作者姓名:张吉亮  尹兰
作者单位:贵州师范大学大数据与计算机科学学院
基金项目:贵州省教育厅青年科技人才成长项目“大数据情境下基于知识网络的碎片化学习研究”(黔教合KY字[2016]132号)
摘    要:分析大量的非结构化文本数据已经成为各类研究及数据分析中的重要任务。本文借助Hadoop分布式计算平台,搭建了一个基于IKAnalyzer开源工具的文本分析应用系统框架,系统基于Spring Boot架构进行了Web应用平台搭建,结合node.js技术构建了数据驱动的Web前端UI呈现。研究实践了从文档收集、文档预处理、分布式计算、中文分词及词频分析、可视化呈现的初步流程。借助该系统平台,研究分别以金庸小说文本数据及采集的贵州省极贫乡镇教育基础数据作为语料数据进行了相关文本统计分析实践。

关 键 词:HADOOP  IKAnalyzer  Srping  BOOT  node.js  中文信息处理

Text Analysis Application of Distributed Computing on Hadoop
ZHANG Jiliang,YIN Lan.Text Analysis Application of Distributed Computing on Hadoop[J].Journal of Anshun College,2020(1):132-136.
Authors:ZHANG Jiliang  YIN Lan
Institution:(Big Data and Computer Science School,Guizhou Normal University Guiyang550001,Guizhou,China)
Abstract:Large amounts of unstructured text data are collected for data tasks or social studies,the analysis and study for these domain texts become the fundamental issue.In this paper,distributed computing platforms of Hadoop are constructed,with the open source tool of IKAnalyzer,A web application with technologies of Sring Boot framework and node.js is implement.Chinese word segmentation and analysis are practiced.Our system is implemented for the process of text gathering,filtering,distributed computing,analysis and results visualization.Meanwhile texts gathered from Jin Yong Kung Fu novels and school documents in Guizhou rural villages and towns are processed in this platform.As the development of Natural Language Processing,inter disciplinary research will get more opportunities and challenges with the support of text mining and domain knowledge.
Keywords:Hadoop  IKAnalyzer  Spring Boot  node  js  Chinese Processing
本文献已被 维普 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号