首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
张晓丹 《情报杂志》2021,(1):184-188
[目的/意义]随着互联网数字资源的剧增,如何从海量数据中挖掘出有价值的信息成为数据挖掘领域研究的热点问题。文本大数据分类是这一领域的关键问题之一。随着深度学习的发展,使得基于深度学习的文本大数据分类成为可能。[方法/过程]针对近年来出现的图神经网络文本分类效率低的问题,提出改进的方法。利用文本、句子及关键词构建拓扑关系图和拓扑关系矩阵,利用马尔科夫链采样算法对每一层的节点进行采样,再利用多级降维方法实现特征降维,最后采用归纳式推理的方式实现文本分类。[结果/结论]为了测试该文所提方法的性能,利用常用的公用语料库和自行构建的NSTL科技期刊文献语料库对本文提出的方法进行实验,与当前常用的文本分类模型进行准确率和推理时间的比较。实验结果表明,所提出的方法可在保证文本及文献大数据分类准确率的前提下,有效提高分类的效率。  相似文献   

2.
基于最小二乘支持向量机的数据挖掘应用研究   总被引:6,自引:0,他引:6  
蔡冬松  靖继鹏 《情报科学》2005,23(12):1877-1880
随着数据仓库技术、联机分析技术的发展。基于数据库的数据挖掘已成为一种重要的数据处理手段。最小二乘支持向量机作为一种新的机器学习方法。具有全局收敛性和良好的泛化能力。本文将其应用于数据挖掘的分类与预测研究。通过棱函数的选择及参数优化,并结合支持向量机、多层感知器神经网络模型及判别分析方法进行比较研究,证明最小二乘支持向量机作为一种有效的数据挖掘算法具有较高精度。  相似文献   

3.
随着信息技术的不断发展,应用商业智能技术进行数据挖掘与分析对商家来说也越来越重要,分类回归树和神经网络算法是数据挖掘的经典算法,其广泛运用在数据分析、预测和评估等方面。文章分别运用分类回归树和神经网络算法对零售商品采取促销方案后收入变化的数据进行分析,并建立相应的模型对促销方案效果进行预测。  相似文献   

4.
[目的/意义]拓展大数据挖掘技术在图书馆学习支持服务中的应用,提高智慧环境下图书馆学习支持服务效果。[方法/过程]梳理图书馆学习支持服务的研究现状及其在智慧环境下的特征,从学习行为、知识关联和学习情境三方面建立学习支持服务的大数据挖掘模型,分析数据来源和大数据挖掘路径。[结果/结论]在大数据挖掘模型的基础上构建层次化图书馆学习支持服务框架,从智能感知、大数据分析、核心业务和智慧终端四个层次,为智慧环境下图书馆学习支持服务的多种应用情境研究提供依据和参考。  相似文献   

5.
This paper presents a semantically rich document representation model for automatically classifying financial documents into predefined categories utilizing deep learning. The model architecture consists of two main modules including document representation and document classification. In the first module, a document is enriched with semantics using background knowledge provided by an ontology and through the acquisition of its relevant terminology. Acquisition of terminology integrated to the ontology extends the capabilities of semantically rich document representations with an in depth-coverage of concepts, thereby capturing the whole conceptualization involved in documents. Semantically rich representations obtained from the first module will serve as input to the document classification module which aims at finding the most appropriate category for that document through deep learning. Three different deep learning networks each belonging to a different category of machine learning techniques for ontological document classification using a real-life ontology are used.Multiple simulations are carried out with various deep neural networks configurations, and our findings reveal that a three hidden layer feedforward network with 1024 neurons obtain the highest document classification performance on the INFUSE dataset. The performance in terms of F1 score is further increased by almost five percentage points to 78.10% for the same network configuration when the relevant terminology integrated to the ontology is applied to enrich document representation. Furthermore, we conducted a comparative performance evaluation using various state-of-the-art document representation approaches and classification techniques including shallow and conventional machine learning classifiers.  相似文献   

6.
Nowadays, online word-of-mouth has an increasing impact on people's views and decisions, which has attracted many people's attention.The classification and sentiment analyse in online consumer reviews have attracted significant research concerns. In this thesis, we propose and implement a new method to study the extraction and classification of online dating services(ODS)’s comments. Different from traditional emotional analysis which mainly focuses on product attribution, we attempted to infer and extract the emotion concept of each emotional reviews by introducing social cognitive theory. In this study, we selected 4,300 comments with extremely negative/positive emotions published on dating websites as a sample, and used three machine learning algorithms to analyze emotions. When testing and comparing the efficiency of user's behavior research, we use various sentiment analysis, machine learning techniques and dictionary-based sentiment analysis. We found that the combination of machine learning and lexicon-based method can achieve higher accuracy than any type of sentiment analysis. This research will provide a new perspective for the task of user behavior.  相似文献   

7.
基于XML的网页数据挖掘   总被引:1,自引:0,他引:1  
随着Internet的迅速发展,使得数据丰富而信息贫乏这对矛盾显得日益突出,数据挖掘技术正是应了这一需求而结合了机器学习、模式识别、统计学、人工智能、神经网络等多学科而出现的一项新技术,基于Web的数据挖掘是数据挖掘技术在网络信息处理中的应用。本文叙述了Web数据挖掘的概念、分类、技术等,重点讨论了基于XML语言的Web数据挖掘技术,解决了Internet上绝大多数非结构化甚至是无结构的、Web信息的组织结构性差而导致的Web数据挖掘困难的问题。  相似文献   

8.
王倩  曾金  刘家伟  戚越 《情报科学》2020,38(3):64-69
【目的/意义】在学术大数据的应用背景下,对学术文本更加细粒度、语义化的分析挖掘日益迫切,学术文本结构功能识别成为科研领域的一个研究热点。【方法/过程】本文从段落的层次来识别章节结构功能,提出利用结合卷积神经网络和循环神经网络的特征对学术文本段落进行表达,然后进行分类。【结果/结论】文本提出的深度学习方法在整体分类结果上优于传统的机器学习方法,同时极大的减少了传统特征工程的人力需求。  相似文献   

9.
Big data generated by social media stands for a valuable source of information, which offers an excellent opportunity to mine valuable insights. Particularly, User-generated contents such as reviews, recommendations, and users’ behavior data are useful for supporting several marketing activities of many companies. Knowing what users are saying about the products they bought or the services they used through reviews in social media represents a key factor for making decisions. Sentiment analysis is one of the fundamental tasks in Natural Language Processing. Although deep learning for sentiment analysis has achieved great success and allowed several firms to analyze and extract relevant information from their textual data, but as the volume of data grows, a model that runs in a traditional environment cannot be effective, which implies the importance of efficient distributed deep learning models for social Big Data analytics. Besides, it is known that social media analysis is a complex process, which involves a set of complex tasks. Therefore, it is important to address the challenges and issues of social big data analytics and enhance the performance of deep learning techniques in terms of classification accuracy to obtain better decisions.In this paper, we propose an approach for sentiment analysis, which is devoted to adopting fastText with Recurrent neural network variants to represent textual data efficiently. Then, it employs the new representations to perform the classification task. Its main objective is to enhance the performance of well-known Recurrent Neural Network (RNN) variants in terms of classification accuracy and handle large scale data. In addition, we propose a distributed intelligent system for real-time social big data analytics. It is designed to ingest, store, process, index, and visualize the huge amount of information in real-time. The proposed system adopts distributed machine learning with our proposed method for enhancing decision-making processes. Extensive experiments conducted on two benchmark data sets demonstrate that our proposal for sentiment analysis outperforms well-known distributed recurrent neural network variants (i.e., Long Short-Term Memory (LSTM), Bidirectional Long Short-Term Memory (BiLSTM), and Gated Recurrent Unit (GRU)). Specifically, we tested the efficiency of our approach using the three different deep learning models. The results show that our proposed approach is able to enhance the performance of the three models. The current work can provide several benefits for researchers and practitioners who want to collect, handle, analyze and visualize several sources of information in real-time. Also, it can contribute to a better understanding of public opinion and user behaviors using our proposed system with the improved variants of the most powerful distributed deep learning and machine learning algorithms. Furthermore, it is able to increase the classification accuracy of several existing works based on RNN models for sentiment analysis.  相似文献   

10.
Most existing research on applying machine learning techniques to document summarization explores either classification models or learning-to-rank models. This paper presents our recent study on how to apply a different kind of learning models, namely regression models, to query-focused multi-document summarization. We choose to use Support Vector Regression (SVR) to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features. In order to learn the regression models, we propose several methods to construct the “pseudo” training data by assigning each sentence with a “nearly true” importance score calculated with the human summaries that have been provided for the corresponding document set. A series of evaluations on the DUC data sets are conducted to examine the efficiency and the robustness of the proposed approaches. When compared with classification models and ranking models, regression models are consistently preferable.  相似文献   

11.
介绍了集成学习入侵检测系统设计的总体思路、总体结构和各模块功能,重点研究了基于遗传算法的集成学习分类引擎工作原理,通过仿真试验说明集成神经网络能克服单个神经网络的缺陷,具有高速数据处理与自学习功能.  相似文献   

12.
[目的/意义]探索智慧城市政策试点对电子政务发展的影响机制,有助于促进电子政务向智慧化方向迈进。[方法/过程]本文采用事件史分析方法,收集244个地级市2011-2016年相关数据,构建智慧城市政策试点与电子政务发展水平的静态和动态面板模型,考察其对电子政务发展的影响。[结果/结论]研究结果显示,智慧城市试点对地方电子政务发展存在短期正向效应和长期负向效应;此外,上一年度电子政务发展水平高低对本期发展绩效具有显著正向影响;同侪效应对地方电子政务发展具有显著且持续的积极作用;而公众因素对电子政务发展的影响却不显著。因此,地方政府应把握智慧城市建设契机,基于城市发展状况,从政策组合、目标选择和经验学习等方面,激发电子政务发展潜能,提升创新治理能力。  相似文献   

13.
[目的/意义]有效融合引文网络中的引用关系和文本属性等多元数据,增强文献节点间的语义关联,从而为数据挖掘和知识发现等任务提供有力的支撑。[方法/过程]提出了一种引文网络的知识表示方法,先利用神经网络模型学习引文网络中的k阶邻近结构;然后使用doc2vec模型学习标题、摘要等文本属性;最后给出了基于向量共享的交叉学习机制用于多元数据融合。[结果/结论]通过面向干细胞领域的CNKI引文数据集的测试,在链路预测上取得了较好的性能,证明了方法的有效性和科学性。  相似文献   

14.
[目的/意义]实体语义关系分类是信息抽取重要任务之一,将非结构化文本转化成结构化知识,是构建领域本体、知识图谱、开发问答系统、信息检索系统的基础工作。[方法/过程]本文详细梳理了实体语义关系分类的发展历程,从技术方法、应用领域两方面回顾和总结了近5年国内外的最新研究成果,并指出了研究的不足及未来的研究方向。[结果/结论]热门的深度学习方法抛弃了传统浅层机器学习方法繁琐的特征工程,自动学习文本特征,实验发现,在神经网络模型中融入词法、句法特征、引入注意力机制能有效提升关系分类性能。  相似文献   

15.
Government agencies often face trade-offs in developing initiatives that address a public good given competing concerns of various constituent groups. Efforts to construct data warehouses that enable data mining of citizens’ personal information obtained from other organizations (including sister agencies) create a complex challenge, since privacy concerns may vary across constituent groups whose priorities diverge from agencies’ e-government goals. In addition to privacy concerns, participating government agencies’ priorities related to the use of the information may also be in conflict. This article reports on a case study of the Integrated Non-Filer Compliance System used by the California Franchise Tax Board for which data are collected from federal, state, and municipal agencies and other organizations in a data mining application that aims to identify residents who under-report income or fail to file tax returns. This system pitted the public good (ensuring owed taxes are paid) against citizen concerns about privacy. Drawing on stakeholder theory, the authors propose a typology of four stakeholder groups (data controllers, data subjects, data providers, and secondary stakeholders) to address privacy concerns and argue that by ensuring procedural fairness for the data subjects, agencies can reduce some barriers that impede the successful adoption of e-government applications and policies. The article concludes that data controllers can reduce adoption and implementation barriers when e-government data mining applications rely on data shared across organizational boundaries: identify legitimate stakeholders and their concerns prior to implementation; enact procedures to ensure procedural fairness when data are captured, shared, and used; explain to each constituency how the data mining application helps to ensure distributive fairness; and continue to gauge stakeholders’ responses and ongoing concerns as long as the application is in use.  相似文献   

16.
Automated legal text classification is a prominent research topic in the legal field. It lays the foundation for building an intelligent legal system. Current literature focuses on international legal texts, such as Chinese cases, European cases, and Australian cases. Little attention is paid to text classification for U.S. legal texts. Deep learning has been applied to improving text classification performance. Its effectiveness needs further exploration in domains such as the legal field. This paper investigates legal text classification with a large collection of labeled U.S. case documents through comparing the effectiveness of different text classification techniques. We propose a machine learning algorithm using domain concepts as features and random forests as the classifier. Our experiment results on 30,000 full U.S. case documents in 50 categories demonstrated that our approach significantly outperforms a deep learning system built on multiple pre-trained word embeddings and deep neural networks. In addition, applying only the top 400 domain concepts as features for building the random forests could achieve the best performance. This study provides a reference to select machine learning techniques for building high-performance text classification systems in the legal domain or other fields.  相似文献   

17.
The initial learning experience is crucial for understanding digital services adoption and usage diffusion. Using a UTAUTv2 model, we explore the effect of process- and content-oriented knowledge on behavioral intentions to use e-government services. The adoption of e-government systems is lower than desired in general and faces considerable resistance in many developing countries. Scholars suggest that more knowledge and better training are critical to increasing adoption and usage rates. We conducted a survey of 262 citizens in Lebanon to investigate how consumers cope with high and moderate levels of complexity during their initial learning experience with a technology-based product. The results show that a moderate degree of content- and process-oriented knowledge about e-government services during an initial learning experience improves usage habits, performance expectancy, effort expectancy, and facilitating conditions. The challenge for service providers is to understand consumers’ learning experience and coping strategies and to provide mechanisms that make the transition to e-services easier and more intuitive. This can be achieved by developing new infrastructure for e-services to facilitate easier access to e-government websites and to improve site performance. Marketers can also develop more effective communications that offer easy and flexible specific steps for using the portal.  相似文献   

18.
This paper examines the relative importance and significance of the four technology enablers introduced by Davis (1989) in the technology acceptance model (TAM) (perceived ease of use, perceived usefulness, attitude towards using and behavioural intention) for use on four different levels of citizen engagement in e-government (null, publish, interact and transact). An extended technology acceptance model (TAM) is developed to test citizen engagement towards online e-government services from a sample of 307 citizens who used the benefits advisor tool within a Spanish City Hall. Although the proposed model follows TAM and explains the intention towards the actual use of e-government by postulating four direct determinants, “A, PU, PEOU and BI” have been considered as parallel processes, meaning that each can have separate influence in different levels of citizen engagement. To achieve this goal, a multinomial logistic regression is developed and tested to confirm the explanatory power of the four technology enablers on the four different levels of e-government. Our findings further suggest that in order to implement e-government, some of the enablers matter more than others to move from one level of citizen engagement to another. The main contribution of the paper is to question the use of existing models which seek to represent the relationship between technology enablers and the adoption of e-government services without considering their impacts on citizens’ engagement. The implications of the findings are discussed and useful insights are provided in relation to policy recommendations geared to create appropriate conditions to build citizens’ engagement intent of use of e-government services.  相似文献   

19.
Eliminating noisy information and extracting informative content have become important issues for web mining, search and accessibility. This extraction process can employ automatic techniques and hand-crafted rules. Automatic extraction techniques focus on various machine learning methods, but implementing these techniques increases time complexity of the extraction process. Conversely, extraction through hand-crafted rules is an efficient technique that uses string manipulation functions, but preparing these rules is difficult and cumbersome for users. In this paper, we present a hybrid approach that contains two steps that can invoke each other. The first step discovers informative content using Decision Tree Learning as an appropriate machine learning method and creates rules from the results of this learning method. The second step extracts informative content using rules obtained from the first step. However, if the second step does not return an extraction result, the first step gets invoked. In our experiments, the first step achieves high accuracy with 95.76% in extraction of the informative content. Moreover, 71.92% of the rules can be used in the extraction process, and it is approximately 240 times faster than the first step.  相似文献   

20.
基于支持向量机的土地覆被遥感分类   总被引:4,自引:0,他引:4  
遥感图像的分类是研究土地变化的基础。传统的遥感图像分类存在着精度不高,不确定性强的特点。本文使用支持向量机(SVM,Support Vector Machine)技术对遥感图像分类,并与传统的最大似然分类进行对比试验。结果表明不同参数组合下SVM的分类总精度和Kappa指数普遍高于最大似然分类的结果,其最高总精度高出最大似然分类0.9779%。SVM和最大似然分类结果都存在着类别混分,但是SVM混分程度远小于最大似然分类,其精度保持在可接受的范围内,如对于低密度草而言,最大似然分类的用户精度下降到84.68%,而支持向量机的用户精度虽然也有下降但还是保持在92.31%。SVM在样本数目很少的情况下表现出了出色的学习能力,是机器学习领域很有希望的一种学习方法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号