首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The breeding and spreading of negative emotion in public emergencies posed severe challenges to social governance. The traditional government information release strategies ignored the negative emotion evolution mechanism. Focusing on the information release policies from the perspectives of the government during public emergency events, by using cognitive big data analytics, our research applies deep learning method into news framing framework construction process, and tries to explore the influencing mechanism of government information release strategy on contagion-evolution of negative emotion. In particular, this paper first uses Word2Vec, cosine word vector similarity calculation and SO-PMI algorithms to build a public emergencies-oriented emotional lexicon; then, it proposes a emotion computing method based on dependency parsing, designs an emotion binary tree and dependency-based emotion calculation rules; and at last, through an experiment, it shows that the emotional lexicon proposed in this paper has a wider coverage and higher accuracy than the existing ones, and it also performs a emotion evolution analysis on an actual public event based on the emotional lexicon, using the emotion computing method proposed. And the empirical results show that the algorithm is feasible and effective. The experimental results showed that this model could effectively conduct fine-grained emotion computing, improve the accuracy and computational efficiency of sentiment classification. The final empirical analysis found that due to such defects as slow speed, non transparent content, poor penitence and weak department coordination, the existing government information release strategies had a significant negative impact on the contagion-evolution of anxiety and disgust emotion, could not regulate negative emotions effectively. These research results will provide theoretical implications and technical supports for the social governance. And it could also help to establish negative emotion management mode, and construct a new pattern of the public opinion guidance.  相似文献   

2.
盛宇  刘俊熙  龙怡  郭金兰 《现代情报》2010,30(1):159-161
系统基于Windows平台,使用VB语言在微软.NET框架下完成,设计并实现一个基于"非规范用词"和"典型案例词典"的文本分类系统,系统采用人脑对自然语言理解的心理学原理"人们总是根据已知的最熟悉的、最典型例子的进行判断,只有在该方法不奏效的时候才使用频率这一概念,并且使用的是十分简单的频率"的分类策略。详细介绍词语切分统计、词库设计、父类词匹配、子类词匹配等子系统模块。  相似文献   

3.
基于词频的中文文本分类研究   总被引:1,自引:0,他引:1  
姚兴山 《现代情报》2009,29(2):179-181
本文对中文文本分类系统的设计和实现进行了阐述,对分类系统的系统结构、特征提取、训练算法、分类算法等进行了详细的介绍。将基于词频统计的方法应用于文本分类。并提出了一种基于汉语中单字词及二字词统计特性的中文文本分类方法,在无词表的情况下,通过统计构造单字和二字词表,对文本进行分类,并取得不错的效果。  相似文献   

4.
Sentiment lexicons are essential tools for polarity classification and opinion mining. In contrast to machine learning methods that only leverage text features or raw text for sentiment analysis, methods that use sentiment lexicons embrace higher interpretability. Although a number of domain-specific sentiment lexicons are made available, it is impractical to build an ex ante lexicon that fully reflects the characteristics of the language usage in endless domains. In this article, we propose a novel approach to simultaneously train a vanilla sentiment classifier and adapt word polarities to the target domain. Specifically, we sequentially track the wrongly predicted sentences and use them as the supervision instead of addressing the gold standard as a whole to emulate the life-long cognitive process of lexicon learning. An exploration-exploitation mechanism is designed to trade off between searching for new sentiment words and updating the polarity score of one word. Experimental results on several popular datasets show that our approach significantly improves the sentiment classification performance for a variety of domains by means of improving the quality of sentiment lexicons. Case-studies also illustrate how polarity scores of the same words are discovered for different domains.  相似文献   

5.
孙瑞英  李杰茹 《情报科学》2021,39(11):157-166
【 目的/意义】个人信息保护政策作为公民个人信息保护的法律保障依据,具有重要的研究价值,将词频分 析、社会网络分析法、内容分析法用于政策解读与分析,推动我国个人信息保护工作的进一步开展。【方法/过程】本 文以《中华人民共和国个人信息保护法(二审稿草案)》(2021年4月29日发布)的政策文本为研究对象,运用词频分 析、社会网络分析法、内容分析的方法,对该法律草案文本进行研究,从而达到以更多视角挖掘分析法律政策条款 内涵的目的,以法律文本的分析为依据完成对我国个人信息推进现状的描述。【结果/结论】通过定量与定性相结合 的研究方法,解读《中华人民共和国个人信息保护法(二审稿草案)》,归纳并揭示法律文本中所蕴含的个人信息保 护工作的运行机理,明确我国个人信息保护立法的进程。【创新/局限】综合运用词频分析、社会网络分析法及内容分 析法解读《中华人民共和国个人信息保护法(二审稿草案)》的内容。但本文只针对我国个人信息保护的现状进行 了分析,并未涉及策略、改进方法等更进一步的讨论。  相似文献   

6.
王仁武  孟现茹  孔琦 《现代情报》2018,38(10):57-64
[目的/意义]研究利用深度学习的循环神经网络GRU结合条件随机场CRF对标注的中文文本序列进行预测,来抽取在线评论文本中的实体-属性。[方法/过程]首先根据设计好的文本序列标注规范,对评论语料分词后进行实体及其属性的命名实体标注,得到单词序列、词性序列和标注序列;然后将单词序列、词性序列转为分布式词向量表示并用于GRU循环神经网络的输入;最后输出层采用条件随机场CRF,输出标签即是实体或属性。[结果/结论]实验结果表明,本文的方法将实体-属性抽取简化为命名实体标注,并利用深度学习的GRU捕获输入数据的上下文语义以及条件随机场CRF获取输出标签的前后关系,比传统的基于规则或一般的机器学习方法具有较大的应用优势。  相似文献   

7.
Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.  相似文献   

8.
This study attempted to use semantic relations expressed in text, in particular cause-effect relations, to improve information retrieval effectiveness. The study investigated whether the information obtained by matching cause-effect relations expressed in documents with the cause-effect relations expressed in users’ queries can be used to improve document retrieval results, in comparison to using just keyword matching without considering relations.An automatic method for identifying and extracting cause-effect information in Wall Street Journal text was developed. Causal relation matching was found to yield a small but significant improvement in retrieval results when the weights used for combining the scores from different types of matching were customized for each query. Causal relation matching did not perform better than word proximity matching (i.e. matching pairs of causally related words in the query with pairs of words that co-occur within document sentences), but the best results were obtained when causal relation matching was combined with word proximity matching. The best kind of causal relation matching was found to be one in which one member of the causal relation (either the cause or the effect) was represented as a wildcard that could match with any word.  相似文献   

9.
全文检索搜索引擎中文信息处理技术研究   总被引:2,自引:0,他引:2  
唐培丽  胡明  解飞  刘钢 《情报科学》2006,24(6):895-899,909
本文深入分析了全文检索中文搜索引擎的关键技术,提出了一种适用于全文检索搜索引擎的中文分词方案,既提高了分词的准确性,又能识别文中的未登录词。针对向量空间信息检索模型,本文设计了一个综合考虑中文词在Web文本中的位置、长度以及频率等重要因素的词条权重计算函数,并且用量化的方法表示出其重要性,能够较准确地反映出词条在Web文档中的重要程度。最后对分词算法进行了测试,测试表明该方法能够提高分词准确度满足实用的要求。  相似文献   

10.
[目的/意义]社会化在线评论与传统的专业性评论相比,具有更为显著的传播速度和影响力。文本评论中的情感因素并非单纯的数量化评分能够完全体现的。对本文评论中情感因素的测量与分析,能够有助于在线评论的全角度识别与揭示,更加客观准确地反映在线评论的价值。[过程/方法]通过提取用户发布的在线文本评论数据,采用有监督机器学习的算法,分别计算文本评论的情感分类得分、情感倾向得分、综合情感得分。从类型、地区、人数多个维度对情感得分与总评分进行交叉对比分析。[结果/结论]研究结果表明,文本评论蕴含的情感因素对总评分具有部分的影响作用。用户的认知偏好、社会文化背景和评论人数占比会对情感因素的有用性产生影响。  相似文献   

11.
余辉  梁镇涛  谢豪 《情报科学》2021,39(7):177-185
【目的/意义】随着网络信息技术的发展及国家对技术转移的政策支持,大量的在线技术交易需求产生。在 线技术转移服务平台作为技术供需交易的媒介,供需双方可以在平台上发布大量的技术供需文本信息,提高技术 供需文本匹配效率,有助于提高技术交易成功率,促进技术转移。【方法/过程】在分析传统文本匹配方法的基础上, 从基于关键词的匹配方法、基于句法分析和文本结构的匹配方法、基于深度学习的匹配方法和基于多维度视角的 匹配方法四个方向对目前在线技术供需文本匹配方法现状进行梳理。【结果/结论】大多数研究都融合了多种匹配 方法,从多维度视角进行匹配是研究发展趋势。在技术供需文本匹配未来研究中,除了继续将深度学习方法融合 到已有的各种方法中,还应该从多维度、跨模态和可解释性方向来提高技术供需文本匹配效率。【创新/局限】本文 将技术文本匹配方法进行归纳总结,能对技术文本的匹配方法提供参考,但现实中技术匹配还应该考虑其他影响 技术转移的因素。  相似文献   

12.
Since previous studies in cognitive psychology show that individuals’ affective states can help analyze and predict their future behaviors, researchers have explored emotion mining for predicting online activities, firm profitability, and so on. Existing emotion mining methods are divided into two categories: feature-based approaches that rely on handcrafted annotations and deep learning-based methods that thrive on computational resources and big data. However, neither category can effectively detect emotional expressions captured in text (e.g., social media postings). In addition, the utilization of these methods in downstream explanatory and predictive applications is also rare. To fill the aforementioned research gaps, we develop a novel deep learning-based emotion detector named DeepEmotionNet that can simultaneously leverage contextual, syntactic, semantic, and document-level features and lexicon-based linguistic knowledge to bootstrap the overall emotion detection performance. Based on three emotion detection benchmark corpora, our experimental results confirm that DeepEmotionNet outperforms state-of-the-art baseline methods by 4.9% to 29.8% in macro-averaged F-score. For the downstream application of DeepEmotionNet to a real-world financial application, our econometric analysis highlights that top executives’ emotions of fear and anger embedded in their social media postings are significantly associated with corporate financial performance. Furthermore, these two emotions can significantly improve the predictive power of corporate financial performance when compared to sentiments. To the best of our knowledge, this is the first study to develop a deep learning-based emotion detection method and successfully apply it to enhance corporate performance prediction.  相似文献   

13.
Automatic text classification is the task of organizing documents into pre-determined classes, generally using machine learning algorithms. Generally speaking, it is one of the most important methods to organize and make use of the gigantic amounts of information that exist in unstructured textual format. Text classification is a widely studied research area of language processing and text mining. In traditional text classification, a document is represented as a bag of words where the words in other words terms are cut from their finer context i.e. their location in a sentence or in a document. Only the broader context of document is used with some type of term frequency information in the vector space. Consequently, semantics of words that can be inferred from the finer context of its location in a sentence and its relations with neighboring words are usually ignored. However, meaning of words, semantic connections between words, documents and even classes are obviously important since methods that capture semantics generally reach better classification performances. Several surveys have been published to analyze diverse approaches for the traditional text classification methods. Most of these surveys cover application of different semantic term relatedness methods in text classification up to a certain degree. However, they do not specifically target semantic text classification algorithms and their advantages over the traditional text classification. In order to fill this gap, we undertake a comprehensive discussion of semantic text classification vs. traditional text classification. This survey explores the past and recent advancements in semantic text classification and attempts to organize existing approaches under five fundamental categories; domain knowledge-based approaches, corpus-based approaches, deep learning based approaches, word/character sequence enhanced approaches and linguistic enriched approaches. Furthermore, this survey highlights the advantages of semantic text classification algorithms over the traditional text classification algorithms.  相似文献   

14.
老年在线社区用户健康信息需求挖掘研究   总被引:4,自引:0,他引:4  
[目的/意义]研究老年在线社区用户的健康信息需求,为利用互联网开展精准的医学教育和科普服务提供依据,优化在线社区服务,吸引和鼓励更多老年人使用网络分享和获取健康信息。[方法/过程]本文采取网络文本挖掘的方法,选取老年论坛"老年人之家"中5 296条用户发布的健康相关文本作为语料库,利用TextRank和TF-IDF两种关键词抽取算法对每条文本抽取关键词,构造关键词共现网络,进行社会网络分析,识别重要关键词和主题,研究老年在线社区用户的健康信息需求。[结果/结论]老年在线社区用户信息需求主要可划分为中医养生原理与方法、生活方式调整与改变、疾病防治与应对老化、食品营养价值与功效4个类型,且不同需求类型间存在复杂的交错关系;用户表露的健康信息需求停留在生理健康层面,而心理健康和社会适应力是潜在的信息需求。通过网络文本挖掘的方法能有效利用用户生成的文本数据,展现用户健康信息需求并发现其中的问题。  相似文献   

15.
利用话题检测技术将Blog信息按照所表达的话题进行归类和组织,可以使Blog信息更加有效、准确地为用户使用。研究了话题检测模型中的词频统计、权重计算以及相似度计算,把简单聚类算法与ISODATA算法相结合,并应用到中文Blog热门话题检测系统中,实验结果表明,文本分类的效果有了进一步的提高。  相似文献   

16.
[目的/意义]开展在线医疗社区用户的需求信息的聚类研究有利于更高效地提供信息推荐服务,对其信息优化以及提高搜索效率具有实际意义.[方法/过程]根据丁香园论坛近23个季度的热点需求词构造词频矩阵,将Logistic增长模型与词频变化率模型相结合对需求词进行二维分选.[结果/结论]通过逻辑曲线拟合以及词频变化率Z值的计算将...  相似文献   

17.
Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zero-shot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.  相似文献   

18.
Although deep learning breakthroughs in NLP are based on learning distributed word representations by neural language models, these methods suffer from a classic drawback of unsupervised learning techniques. Furthermore, the performance of general-word embedding has been shown to be heavily task-dependent. To tackle this issue, recent researches have been proposed to learn the sentiment-enhanced word vectors for sentiment analysis. However, the common limitation of these approaches is that they require external sentiment lexicon sources and the construction and maintenance of these resources involve a set of complexing, time-consuming, and error-prone tasks. In this regard, this paper proposes a method of sentiment lexicon embedding that better represents sentiment word's semantic relationships than existing word embedding techniques without manually-annotated sentiment corpus. The major distinguishing factor of the proposed framework was that joint encoding morphemes and their POS tags, and training only important lexical morphemes in the embedding space. To verify the effectiveness of the proposed method, we conducted experiments comparing with two baseline models. As a result, the revised embedding approach mitigated the problem of conventional context-based word embedding method and, in turn, improved the performance of sentiment classification.  相似文献   

19.
Social emotion refers to the emotion evoked to the reader by a textual document. In contrast to the emotion cause extraction task which analyzes the cause of the author's sentiments based on the expressions in text, identifying the causes of social emotion evoked to the reader from text has not been explored previously. Social emotion mining and its cause analysis is not only an important research topic in Web-based social media analytics and text mining but also has a number of applications in multiple domains. As the focus of social emotion cause identification is on analyzing the causes of the reader's emotions elicited by a text that are not explicitly or implicitly expressed, it is a challenging task fundamentally different from the previous research. To tackle this, it also needs a deeper level understanding of the cognitive process underlying the inference of social emotion and its cause analysis. In this paper, we propose the new task of social emotion cause identification (SECI). Inspired by the cognitive structure of emotions (OCC) theory, we present a Cognitive Emotion model Enhanced Sequential (CogEES) method for SECI. Specifically, based on the implications of the OCC model, our method first establishes the correspondence between words/phrases in text and emotional dimensions identified in OCC and builds the emotional dimension lexicons with 1,676 distinct words/phrases. Then, our method utilizes lexicons information and discourse coherence for the semantic segmentation of document and the enhancement of clause representation learning. Finally, our method combines text segmentation and clause representation into a sequential model for cause clause prediction. We construct the SECI dataset for this new task and conduct experiments to evaluate CogEES. Our method outperforms the baselines and achieves over 10% F1 improvement on average, with better interpretability of the prediction results.  相似文献   

20.
With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号