首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
To improve the effect of multimodal negative sentiment recognition of online public opinion on public health emergencies, we constructed a novel multimodal fine-grained negative sentiment recognition model based on graph convolutional networks (GCN) and ensemble learning. This model comprises BERT and ViT-based multimodal feature representation, GCN-based feature fusion, multiple classifiers, and ensemble learning-based decision fusion. Firstly, the image-text data about COVID-19 is collected from Sina Weibo, and the text and image features are extracted through BERT and ViT, respectively. Secondly, the image-text fused features are generated through GCN in the constructed microblog graph. Finally, AdaBoost is trained to decide the final sentiments recognized by the best classifiers in image, text, and image-text fused features. The results show that the F1-score of this model is 84.13% in sentiment polarity recognition and 82.06% in fine-grained negative sentiment recognition, improved by 4.13% and 7.55% compared to the optimal recognition effect of image-text feature fusion, respectively.  相似文献   

2.
In an environment full of disordered information, the media spreads fake or harmful information into the public arena with a speed which is faster than ever before. A news report should ideally be neutral and factual. Excessive personal emotions or viewpoints should not be included. News articles ought not to be intentionally or maliciously written or create a media framing. A harmful news is defined as those explicit or implicit harmful speech in news text that harms people or affects readers’ perception. However, in the current situation, it is difficult to effectively identify and predict fake or harmful news in advance, especially harmful news. Therefore, in this study, we propose a Bidirectional Encoder Representation from Transformers (BERT) based model which applies ensemble learning methods with a text sentiment analysis to identify harmful news, aiming to provide readers with a way to identify harmful news content so as to help them to judge whether the information provided is in a more neutral manner. The working model of the proposed system has two phases. The first phase is collecting harmful news and establishing a development model for analyzing the correlation between text sentiment and harmful news. The second phase is identifying harmful news by analyzing text sentiment with an ensemble learning technique and the BERT model. The purpose is to determine whether the news has harmful intentions. Our experimental results show that the F1-score of the proposed model reaches 66.3%, an increase of 7.8% compared with that of the previous term frequency-inverse document frequency approach which adopts a Lagrangian Support Vector Machine (LSVM) model without using a text sentiment. Moreover, the proposed method achieves a better performance in recognizing various cases of information disorder.  相似文献   

3.
Detecting sentiments in natural language is tricky even for humans, making its automated detection more complicated. This research proffers a hybrid deep learning model for fine-grained sentiment prediction in real-time multimodal data. It reinforces the strengths of deep learning nets in combination to machine learning to deal with two specific semiotic systems, namely the textual (written text) and visual (still images) and their combination within the online content using decision level multimodal fusion. The proposed contextual ConvNet-SVMBoVW model, has four modules, namely, the discretization, text analytics, image analytics, and decision module. The input to the model is multimodal text, m ε {text, image, info-graphic}. The discretization module uses Google Lens to separate the text from the image, which is then processed as discrete entities and sent to the respective text analytics and image analytics modules. Text analytics module determines the sentiment using a hybrid of a convolution neural network (ConvNet) enriched with the contextual semantics of SentiCircle. An aggregation scheme is introduced to compute the hybrid polarity. A support vector machine (SVM) classifier trained using bag-of-visual-words (BoVW) for predicting the visual content sentiment. A Boolean decision module with a logical OR operation is augmented to the architecture which validates and categorizes the output on the basis of five fine-grained sentiment categories (truth values), namely ‘highly positive,’ ‘positive,’ ‘neutral,’ ‘negative’ and ‘highly negative.’ The accuracy achieved by the proposed model is nearly 91% which is an improvement over the accuracy obtained by the text and image modules individually.  相似文献   

4.
Effective learning schemes such as fine-tuning, zero-shot, and few-shot learning, have been widely used to obtain considerable performance with only a handful of annotated training data. In this paper, we presented a unified benchmark to facilitate the problem of zero-shot text classification in Turkish. For this purpose, we evaluated three methods, namely, Natural Language Inference, Next Sentence Prediction and our proposed model that is based on Masked Language Modeling and pre-trained word embeddings on nine Turkish datasets for three main categories: topic, sentiment, and emotion. We used pre-trained Turkish monolingual and multilingual transformer models which can be listed as BERT, ConvBERT, DistilBERT and mBERT. The results showed that ConvBERT with the NLI method yields the best results with 79% and outperforms previously used multilingual XLM-RoBERTa model by 19.6%. The study contributes to the literature using different and unattempted transformer models for Turkish and showing improvement of zero-shot text classification performance for monolingual models over multilingual models.  相似文献   

5.
Multimodal sentiment analysis aims to judge the sentiment of multimodal data uploaded by the Internet users on various social media platforms. On one hand, existing studies focus on the fusion mechanism of multimodal data such as text, audio and visual, but ignore the similarity of text and audio, text and visual, and the heterogeneity of audio and visual, resulting in deviation of sentiment analysis. On the other hand, multimodal data brings noise irrelevant to sentiment analysis, which affects the effectness of fusion. In this paper, we propose a Polar-Vector and Strength-Vector mixer model called PS-Mixer, which is based on MLP-Mixer, to achieve better communication between different modal data for multimodal sentiment analysis. Specifically, we design a Polar-Vector (PV) and a Strength-Vector (SV) for judging the polar and strength of sentiment separately. PV is obtained from the communication of text and visual features to decide the sentiment that is positive, negative, or neutral sentiment. SV is gained from the communication between the text and audio features to analyze the sentiment strength in the range of 0 to 3. Furthermore, we devise an MLP-Communication module (MLP-C) composed of several fully connected layers and activation functions to make the different modal features fully interact in both the horizontal and the vertical directions, which is a novel attempt to use MLP for multimodal information communication. Finally, we mix PV and SV to obtain a fusion vector to judge the sentiment state. The proposed PS-Mixer is tested on two publicly available datasets, CMU-MOSEI and CMU-MOSI, which achieves the state-of-the-art (SOTA) performance on CMU-MOSEI compared with baseline methods. The codes are available at: https://github.com/metaphysicser/PS-Mixer.  相似文献   

6.
Social media users are increasingly using both images and text to express their opinions and share their experiences, instead of only using text in the conventional social media. Consequently, the conventional text-based sentiment analysis has evolved into more complicated studies of multimodal sentiment analysis. To tackle the challenge of how to effectively exploit the information from both visual content and textual content from image-text posts, this paper proposes a new image-text consistency driven multimodal sentiment analysis approach. The proposed approach explores the correlation between the image and the text, followed by a multimodal adaptive sentiment analysis method. To be more specific, the mid-level visual features extracted by the conventional SentiBank approach are used to represent visual concepts, with the integration of other features, including textual, visual and social features, to develop a machine learning sentiment analysis approach. Extensive experiments are conducted to demonstrate the superior performance of the proposed approach.  相似文献   

7.
An idiom is a common phrase that means something other than its literal meaning. Detecting idioms automatically is a serious challenge in natural language processing (NLP) domain applications like information retrieval (IR), machine translation and chatbot. Automatic detection of Idioms plays an important role in all these applications. A fundamental NLP task is text classification, which categorizes text into structured categories known as text labeling or categorization. This paper deals with idiom identification as a text classification task. Pre-trained deep learning models have been used for several text classification tasks; though models like BERT and RoBERTa have not been exclusively used for idiom and literal classification. We propose a predictive ensemble model to classify idioms and literals using BERT and RoBERTa, fine-tuned with the TroFi dataset. The model is tested with a newly created in house dataset of idioms and literal expressions, numbering 1470 in all, and annotated by domain experts. Our model outperforms the baseline models in terms of the metrics considered, such as F-score and accuracy, with a 2% improvement in accuracy.  相似文献   

8.
利用文本分类、情感分析等自然语言处理手段,开发基于互联网文本信息的地区环境形象评价方法。为满足生态环境大数据的分析需求,划分了环境形象类别,分别从文体来源、情感极性和环境要素这三种角度评价地区环境形象。人工标注环境文本语料,对比支持向量机、朴素贝叶斯和卷积神经网络三种算法,最终构建了以卷积神经网络为核心算法的地区环境形象评价模型。方法的分类效果较好,三种分类的F1值均满足分析需求,环境要素的F1值在0.8~0.9之间,情感分析的F1值在0.8以上,文体来源的F1值在0.9左右。该方法应用在长三角城市,可实时处理地区热点环境舆情,分析地区环境形象,提供精准直观的环境形象评估结果,为区域环境管理提供基础信息支持。  相似文献   

9.
唐樾  马静 《情报科学》2022,40(6):108-114
【目的/意义】随着社交网络的复杂化,当前谣言往往是由描述事件的文本、对应的图片或者视频组成,多种 模态的谣言更容易给用户传达一种错误的认知。现有谣言检测的研究往往只使用谣言文本特征,且未能充分挖掘 谣言与事件存在的联系。【方法/过程】因此本文提出一种基于增强对抗网络和多模态融合的谣言检测方法,使用 BERT 和 Text-CNN 提取文本特征,使用 VGG-19网络提取图像特征,再通过注意力机制捕捉多个模态的特征交 互,最后使用增强对抗网络来挖掘谣言和事件之间联系。【结果/结论】在公开的微博多模态数据集上进行对比实 验,实验结果表明该方法检测的准确率达到了 92.5%,相较于传统单模态和现有多模态模型,提升了约 10%~20%。 【创新/局限】本文将对抗网络和多模态特征融入谣言检测中,有效提升了谣言检测的效果,但目前仅尝试了文本和 图像两种模态的结合,如何融合更多模态的特征后续有待研究。  相似文献   

10.
The pre-trained language models (PLMs), such as BERT, have been successfully employed in two-phases ranking pipeline for information retrieval (IR). Meanwhile, recent studies have reported that BERT model is vulnerable to imperceptible textual perturbations on quite a few natural language processing (NLP) tasks. As for IR tasks, current established BERT re-ranker is mainly trained on large-scale and relatively clean dataset, such as MS MARCO, but actually noisy text is more common in real-world scenarios, such as web search. In addition, the impact of within-document textual noises (perturbations) on retrieval effectiveness remains to be investigated, especially on the ranking quality of BERT re-ranker, considering its contextualized nature. To mitigate this gap, we carry out exploratory experiments on the MS MARCO dataset in this work to examine whether BERT re-ranker can still perform well when ranking text with noise. Unfortunately, we observe non-negligible effectiveness degradation of BERT re-ranker over a total of ten different types of synthetic within-document textual noise. Furthermore, to address the effectiveness losses over textual noise, we propose a novel noise-tolerant model, De-Ranker, which is learned by minimizing the distance between the noisy text and its original clean version. Our evaluation on the MS MARCO and TREC 2019–2020 DL datasets demonstrates that De-Ranker can deal with synthetic textual noise more effectively, with 3%–4% performance improvement over vanilla BERT re-ranker. Meanwhile, extensive zero-shot transfer experiments on a total of 18 widely-used IR datasets show that De-Ranker can not only tackle natural noise in real-world text, but also achieve 1.32% improvement on average in terms of cross-domain generalization ability on the BEIR benchmark.  相似文献   

11.
马达  卢嘉蓉  朱侯 《情报科学》2023,41(2):60-68
【目的/意义】探究针对微博文本的基于深度学习的情绪分类有效方法,研究微博热点事件下用户转发言论的情绪类型与隐私信息传播的关系。【方法/过程】选用BERT、BERT+CNN、BERT+RNN和ERNIE四个深度学习分类模型设置对比实验,在重新构建情绪7分类语料库的基础上验证性能较好的模型。选取4个微博热点案例,从情绪分布、情感词词频、转发时间和转发次数四个方面展开实证分析。【结果/结论】通过实证研究发现,用户在传播隐私信息是急速且短暂的,传播时以“愤怒”和“厌恶”等为代表的消极情绪占主导地位,且会因隐私信息主体的不同而产生情绪类型和表达方式上的差异。【创新/局限】研究了用户在传播隐私信息行为时的情绪特征及二者的联系,为保护社交网络用户隐私信息安全提供有价值的理论和现实依据,但所构建的语料库数据量对于训练一个高准确率的深度学习模型而言还不够,且模型对于反话、反讽等文本的识别效果不佳。  相似文献   

12.
As an emerging task in opinion mining, End-to-End Multimodal Aspect-Based Sentiment Analysis (MABSA) aims to extract all the aspect-sentiment pairs mentioned in a pair of sentence and image. Most existing methods of MABSA do not explicitly incorporate aspect and sentiment information in their textual and visual representations and fail to consider the different contributions of visual representations to each word or aspect in the text. To tackle these limitations, we propose a multi-task learning framework named Cross-Modal Multitask Transformer (CMMT), which incorporates two auxiliary tasks to learn the aspect/sentiment-aware intra-modal representations and introduces a Text-Guided Cross-Modal Interaction Module to dynamically control the contributions of the visual information to the representation of each word in the inter-modal interaction. Experimental results demonstrate that CMMT consistently outperforms the state-of-the-art approach JML by 3.1, 3.3, and 4.1 absolute percentage points on three Twitter datasets for the End-to-End MABSA task, respectively. Moreover, further analysis shows that CMMT is superior to comparison systems in both aspect extraction (AE) and sentiment classification (SC), which would move the development of multimodal AE and SC algorithms forward with improved performance.  相似文献   

13.
Recently, sentiment classification has received considerable attention within the natural language processing research community. However, since most recent works regarding sentiment classification have been done in the English language, there are accordingly not enough sentiment resources in other languages. Manual construction of reliable sentiment resources is a very difficult and time-consuming task. Cross-lingual sentiment classification aims to utilize annotated sentiment resources in one language (typically English) for sentiment classification of text documents in another language. Most existing research works rely on automatic machine translation services to directly project information from one language to another. However, different term distribution between original and translated text documents and translation errors are two main problems faced in the case of using only machine translation. To overcome these problems, we propose a novel learning model based on active learning and semi-supervised co-training to incorporate unlabelled data from the target language into the learning process in a bi-view framework. This model attempts to enrich training data by adding the most confident automatically-labelled examples, as well as a few of the most informative manually-labelled examples from unlabelled data in an iterative process. Further, in this model, we consider the density of unlabelled data so as to select more representative unlabelled examples in order to avoid outlier selection in active learning. The proposed model was applied to book review datasets in three different languages. Experiments showed that our model can effectively improve the cross-lingual sentiment classification performance and reduce labelling efforts in comparison with some baseline methods.  相似文献   

14.
Constructing ensemble models has become a common method for corporate credit risk early warning, while as to deep learning model with better predictive ability, there have been no fixed theoretical models formed in corporate credit risk early warning, as such models often fail to conduct further qualitative analysis of the results. Thus, this article builds a new two-stage ensemble model using a variety of machine learning methods represented by deep learning for corporate credit risk early warning, which can not only effectively improve the prediction performance of the model, but also qualitatively analyze the source of corporate credit risk from multiple angles according to the results. At first stage, the improved entropy method is used to re-assign the instance weight in correlation degree based on grey correlation analysis. At second stage, this study adopts Bagging method to integrate multiple one-dimensional convolutional neural networks, and borrows idea of N-fold cross validation to expand the difference of the base classifier. Empirically, this article selects listed companies in Chinese manufacturing industry between 2012 and 2021 as datasets, including 467 samples with 51 financial indicators. The new ensemble model has the highest F1-score (87.29%) and G-mean (89.47%) among comparative models, and qualitatively analyzes corporate risk sources. Further, it also analyzes how to increase early warning effect from the angles of indicator number and time span.  相似文献   

15.
Aspect-based sentiment analysis allows one to compute the sentiment for an aspect in a certain context. One problem in this analysis is that words possibly carry different sentiments for different aspects. Moreover, an aspect’s sentiment might be highly influenced by the domain-specific knowledge. In order to tackle these issues, in this paper, we propose a hybrid solution for sentence-level aspect-based sentiment analysis using A Lexicalized Domain Ontology and a Regularized Neural Attention model (ALDONAr). The bidirectional context attention mechanism is introduced to measure the influence of each word in a given sentence on an aspect’s sentiment value. The classification module is designed to handle the complex structure of a sentence. The manually created lexicalized domain ontology is integrated to utilize the field-specific knowledge. Compared to the existing ALDONA model, ALDONAr uses BERT word embeddings, regularization, the Adam optimizer, and different model initialization. Moreover, its classification module is enhanced with two 1D CNN layers providing superior results on standard datasets.  相似文献   

16.
张国标  李洁  胡潇戈 《情报科学》2021,39(10):126-132
【目的/意义】社交媒体在改变新闻传播以及人类获取信息方式的同时,也成为了虚假新闻传播的主要渠 道。因此,快速识别社交媒体中的虚假新闻,扼制虚假信息的传播,对净化网络空间、维护公共安全至关重要。【方 法/过程】为了有效识别社交媒体上发布的虚假新闻,本文基于对虚假新闻内容特征的深入剖析,分别设计了文本 词向量、文本情感、图像底层、图像语义特征的表示方法,用以提取社交网络中虚假新闻的图像特征信息和文本特 征信息,构建多模态特征融合的虚假新闻检测模型,并使用MediaEval2015数据集对模型性能进行效果验证。【结果/ 结论】通过对比分析不同特征组合方式和不同分类方法的实验结果,发现融合文本特征和图像特征的多模态模型 可以有效提升虚假新闻检测效果。【创新/局限】研究从多模态的角度设计了虚假新闻检测模型,融合了文本与图像 的多种特征。然而采用向量拼接来实现特征融合,不仅无法实现各种特征的充分互补,而且容易造成维度灾难。  相似文献   

17.
Vital to the task of Sentiment Analysis (SA), or automatically mining sentiment expression from text, is a sentiment lexicon. This fundamental lexical resource comprises the smallest sentiment-carrying units of text, words, annotated for their sentiment properties, and aids in SA tasks on larger pieces of text. Unfortunately, digital dictionaries do not readily include information on the sentiment properties of their entries, and manually compiling sentiment lexicons is tedious in terms of annotator time and effort. This has resulted in the emergence of a large number of research works concentrated on automated sentiment lexicon generation. The dictionary-based approach involves leveraging digital dictionaries, while the corpus-based approach involves exploiting co-occurrence statistics embedded in text corpora. Although the former approach has been exhaustively investigated, the majority of works focus on terms. The few state-of-the-art models concentrated on the finer-grained term sense level remain to exhibit several prominent limitations, e.g., the proposed semantic relations algorithm retrieves only senses that are at a close proximity to the seed senses in the semantic network, thus prohibiting the retrieval of remote sentiment-carrying senses beyond the reach of the ‘radius’ defined by number of iterations of semantic relations expansion. The proposed model aims to overcome the issues inherent in dictionary-based sense-level sentiment lexicon generation models using: (1) null seed sets, and a morphological approach inspired by the Marking Theory in Linguistics to populate them automatically; (2) a dual-step context-aware gloss expansion algorithm that ‘mines’ human defined gloss information from a digital dictionary, ensuring senses overlooked by the semantic relations expansion algorithm are identified; and (3) a fully-unsupervised sentiment categorization algorithm on the basis of the Network Theory. The results demonstrate that context-aware in-gloss matching successfully retrieves senses beyond the reach of the semantic relations expansion algorithm used by prominent, well-known models. Evaluation of the proposed model to accurately assign senses with polarity demonstrates that it is on par with state-of-the-art models against the same gold standard benchmarks. The model has theoretical implications in future work to effectively exploit the readily-available human-defined gloss information in a digital dictionary, in the task of assigning polarity to term senses. Extrinsic evaluation in a real-world sentiment classification task on multiple publically-available varying-domain datasets demonstrates its practical implication and application in sentiment analysis, as well as in other related fields such as information science, opinion retrieval and computational linguistics.  相似文献   

18.
Sentiment lexicons are essential tools for polarity classification and opinion mining. In contrast to machine learning methods that only leverage text features or raw text for sentiment analysis, methods that use sentiment lexicons embrace higher interpretability. Although a number of domain-specific sentiment lexicons are made available, it is impractical to build an ex ante lexicon that fully reflects the characteristics of the language usage in endless domains. In this article, we propose a novel approach to simultaneously train a vanilla sentiment classifier and adapt word polarities to the target domain. Specifically, we sequentially track the wrongly predicted sentences and use them as the supervision instead of addressing the gold standard as a whole to emulate the life-long cognitive process of lexicon learning. An exploration-exploitation mechanism is designed to trade off between searching for new sentiment words and updating the polarity score of one word. Experimental results on several popular datasets show that our approach significantly improves the sentiment classification performance for a variety of domains by means of improving the quality of sentiment lexicons. Case-studies also illustrate how polarity scores of the same words are discovered for different domains.  相似文献   

19.
Sarcasm expression is a pervasive literary technique in which people intentionally express the opposite of what is implied. Accurate detection of sarcasm in a text can facilitate the understanding of speakers’ true intentions and promote other natural language processing tasks, especially sentiment analysis tasks. Since sarcasm is a kind of implicit sentiment expression and speakers deliberately confuse the audience, it is challenging to detect sarcasm only by text. Existing approaches based on machine learning and deep learning achieved unsatisfactory performance when handling sarcasm text with complex expression or needing specific background knowledge to understand. Especially, due to the characteristics of the Chinese language itself, sarcasm detection in Chinese is more difficult. To alleviate this dilemma on Chinese sarcasm detection, we propose a sememe and auxiliary enhanced attention neural model, SAAG. At the word level, we introduce sememe knowledge to enhance the representation learning of Chinese words. Sememe is the minimum unit of meaning, which is a fine-grained portrayal of a word. At the sentence level, we leverage some auxiliary information, such as the news title, to learning the representation of the context and background of sarcasm expression. Then, we construct the representation of text expression progressively and dynamically. The evaluation on a sarcasm dateset, consisting of comments on news text, reveals that our proposed approach is effective and outperforms the state-of-the-art models.  相似文献   

20.
Aspect-based sentiment analysis aims to determine sentiment polarities toward specific aspect terms within the same sentence or document. Most recent studies adopted attention-based neural network models to implicitly connect aspect terms with context words. However, these studies were limited by insufficient interaction between aspect terms and opinion words, leading to poor performance on robustness test sets. In addition, we have found that robustness test sets create new sentences that interfere with the original information of a sentence, which often makes the text too long and leads to the problem of long-distance dependence. Simultaneously, these new sentences produce more non-target aspect terms, misleading the model because of the lack of relevant knowledge guidance. This study proposes a knowledge guided multi-granularity graph convolutional neural network (KMGCN) to solve these problems. The multi-granularity attention mechanism is designed to enhance the interaction between aspect terms and opinion words. To address the long-distance dependence, KMGCN uses a graph convolutional network that relies on a semantic map based on fine-tuning pre-trained models. In particular, KMGCN uses a mask mechanism guided by conceptual knowledge to encounter more aspect terms (including target and non-target aspect terms). Experiments are conducted on 12 SemEval-2014 variant benchmarking datasets, and the results demonstrated the effectiveness of the proposed framework.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号