首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Most previous works of feature selection emphasized only the reduction of high dimensionality of the feature space. But in cases where many features are highly redundant with each other, we must utilize other means, for example, more complex dependence models such as Bayesian network classifiers. In this paper, we introduce a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models. Our feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results are given on a number of dataset, showing that our feature selection method is more effective than Koller and Sahami’s method [Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of ICML-96, 13th international conference on machine learning], which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, our feature selection method sometimes produces more improvements of conventional machine learning algorithms over support vector machines which are known to give the best classification accuracy.  相似文献   

2.
为去除网络入侵数据集中的冗余和噪声特征,降低数据处理难度和提高检测性能,提出一种基于特征选择和支持向量机的入侵检测方法。该方法采用提出的特征选择算法选取最优特征组合,并以支持向量机为分类器建立模型,应用于入侵检测系统。仿真结果表明,本文方法不仅可以减少特征维数,降低训练和测试时间,还能提高入侵检测的分类准确率。  相似文献   

3.
To improve the effect of multimodal negative sentiment recognition of online public opinion on public health emergencies, we constructed a novel multimodal fine-grained negative sentiment recognition model based on graph convolutional networks (GCN) and ensemble learning. This model comprises BERT and ViT-based multimodal feature representation, GCN-based feature fusion, multiple classifiers, and ensemble learning-based decision fusion. Firstly, the image-text data about COVID-19 is collected from Sina Weibo, and the text and image features are extracted through BERT and ViT, respectively. Secondly, the image-text fused features are generated through GCN in the constructed microblog graph. Finally, AdaBoost is trained to decide the final sentiments recognized by the best classifiers in image, text, and image-text fused features. The results show that the F1-score of this model is 84.13% in sentiment polarity recognition and 82.06% in fine-grained negative sentiment recognition, improved by 4.13% and 7.55% compared to the optimal recognition effect of image-text feature fusion, respectively.  相似文献   

4.
Sentiment analysis concerns the study of opinions expressed in a text. Due to the huge amount of reviews, sentiment analysis plays a basic role to extract significant information and overall sentiment orientation of reviews. In this paper, we present a deep-learning-based method to classify a user's opinion expressed in reviews (called RNSA).To the best of our knowledge, a deep learning-based method in which a unified feature set which is representative of word embedding, sentiment knowledge, sentiment shifter rules, statistical and linguistic knowledge, has not been thoroughly studied for a sentiment analysis. The RNSA employs the Recurrent Neural Network (RNN) which is composed by Long Short-Term Memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about the word are vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the following drawbacks: words with similar semantic context but opposite sentiment polarity; contextual polarity; sentence types; word coverage limit of an individual lexicon; word sense variations. To verify the effectiveness of our work, we conduct sentence-level sentiment classification on large-scale review datasets. We obtained encouraging result. Experimental results show that (1) feature vectors in terms of (a) statistical, linguistic and sentiment knowledge, (b) sentiment shifter rules and (c) word-embedding can improve the classification accuracy of sentence-level sentiment analysis; (2) our method that learns from this unified feature set can obtain significant performance than one that learns from a feature subset; (3) our neural model yields superior performance improvements in comparison with other well-known approaches in the literature.  相似文献   

5.
The appearance attribute and pose are two important and complementary features, so integrating them can effectively alleviate the impact of misalignment and occlusion on re-identification. In this paper, we deeply investigate the inner relation between attribute features and the spatial semantic relation between key-point region features of the pose in a person image and propose a person re-identification method based on discriminative feature mining with relation regularization. Firstly, an attribute relation detector based on nonlinear graph convolution is built on mining the inner correlation between attribute features of a person, providing relational attribute features for more effectively distinguishing persons with a similar appearance. Then, we construct a hierarchical pose pyramid to model the multi-grained semantic features of key-point regions of the pose and propose intra-graph and cross-graph node relation information propagation structures to infer the spatial semantic relation between node features within-graph and between-graph. This module is robust to complex pose changes and can suppress noise background redundancy caused by inaccurate key point detection and occlusion. Finally, a refined feature model is proposed to effectively fuse the global appearance feature with the relational attribute and multi-grained pose features, thus providing a more discriminative fusion feature for person re-identification. Many experiments on three large-scale datasets verify the effectiveness and state-of-the-art performance of the proposed method.  相似文献   

6.
Aspect-based sentiment analysis aims to predict the sentiment polarities of specific targets in a given text. Recent researches show great interest in modeling the target and context with attention network to obtain more effective feature representation for sentiment classification task. However, the use of an average vector of target for computing the attention score for context is unfair. Besides, the interaction mechanism is simple thus need to be further improved. To solve the above problems, this paper first proposes a coattention mechanism which models both target-level and context-level attention alternatively so as to focus on those key words of targets to learn more effective context representation. On this basis, we implement a Coattention-LSTM network which learns nonlinear representations of context and target simultaneously and can extracts more effective sentiment feature from coattention mechanism. Further, a Coattention-MemNet network which adopts a multiple-hops coattention mechanism is proposed to improve the sentiment classification result. Finally, we propose a new location weighted function which considers the location information to enhance the performance of coattention mechanism. Extensive experiments on two public datasets demonstrate the effectiveness of all proposed methods, and our findings in the experiments provide new insight for future developments of using attention mechanism and deep neural network for aspect-based sentiment analysis.  相似文献   

7.
针对空间域隐写,分析了小波特征函数统计矩、高阶统计特征和差值像素邻接矩阵3组重要隐写特征间的互补性,利用基于互信息准则和增强特征选择的方法进行特征融合.分析和实验表明,3组特征间存在互补性,融合后能够得到更好的正确率.  相似文献   

8.
Unsupervised feature selection is very attractive in many practical applications, as it needs no semantic labels during the learning process. However, the absence of semantic labels makes the unsupervised feature selection more challenging, as the method can be affected by the noise, redundancy, or missing in the originally extracted features. Currently, most methods either consider the influence of noise for sparse learning or think over the internal structure information of the data, leading to suboptimal results. To relieve these limitations and improve the effectiveness of unsupervised feature selection, we propose a novel method named Adaptive Dictionary and Structure Learning (ADSL) that conducts spectral learning and sparse dictionary learning in a unified framework. Specifically, we adaptively update the dictionary based on sparse dictionary learning. And, we also introduce the spectral learning method of adaptive updating affinity matrix. While removing redundant features, the intrinsic structure of the original data can be retained. In addition, we adopt matrix completion in our framework to make it competent for fixing the missing data problem. We validate the effectiveness of our method on several public datasets. Experimental results show that our model not only outperforms some state-of-the-art methods on complete datasets but also achieves satisfying results on incomplete datasets.  相似文献   

9.
张国标  李洁  胡潇戈 《情报科学》2021,39(10):126-132
【目的/意义】社交媒体在改变新闻传播以及人类获取信息方式的同时,也成为了虚假新闻传播的主要渠 道。因此,快速识别社交媒体中的虚假新闻,扼制虚假信息的传播,对净化网络空间、维护公共安全至关重要。【方 法/过程】为了有效识别社交媒体上发布的虚假新闻,本文基于对虚假新闻内容特征的深入剖析,分别设计了文本 词向量、文本情感、图像底层、图像语义特征的表示方法,用以提取社交网络中虚假新闻的图像特征信息和文本特 征信息,构建多模态特征融合的虚假新闻检测模型,并使用MediaEval2015数据集对模型性能进行效果验证。【结果/ 结论】通过对比分析不同特征组合方式和不同分类方法的实验结果,发现融合文本特征和图像特征的多模态模型 可以有效提升虚假新闻检测效果。【创新/局限】研究从多模态的角度设计了虚假新闻检测模型,融合了文本与图像 的多种特征。然而采用向量拼接来实现特征融合,不仅无法实现各种特征的充分互补,而且容易造成维度灾难。  相似文献   

10.
[目的/意义]针对单纯使用统计自然语言处理技术对社交网络上产生的短文本数据进行意向分类时存在的特征稀疏、语义模糊和标记数据不足等问题,提出了一种融合心理语言学信息的Co-training意图分类方法。[方法/过程]首先,为丰富语义信息,在提取文本特征的同时融合带有情感倾向的心理语言学线索对特征维度进行扩展。其次,针对标记数据有限的问题,在模型训练阶段使用半监督集成法对两种机器学习分类方法(基于事件内容表达分类器与情感事件表达分类器)进行协同训练(Co-training)。最后,采用置信度乘积的投票制进行分类。[结论/结果]实验结果表明融入心理语言学信息的语料再经过协同训练的分类效果更优。  相似文献   

11.
Nowadays, stress has become a growing problem for society due to its high impact on individuals but also on health care systems and companies. In order to overcome this problem, early detection of stress is a key factor. Previous studies have shown the effectiveness of text analysis in the detection of sentiment, emotion, and mental illness. However, existing solutions for stress detection from text are focused on a specific corpus. There is still a lack of well-validated methods that provide good results in different datasets. We aim to advance state of the art by proposing a method to detect stress in textual data and evaluating it using multiple public English datasets. The proposed approach combines lexicon-based features with distributional representations to enhance classification performance. To help organize features for stress detection in text, we propose a lexicon-based feature framework that exploits affective, syntactic, social, and topic-related features. Also, three different word embedding techniques are studied for exploiting distributional representation. Our approach has been implemented with three machine learning models that have been evaluated in terms of performance through several experiments. This evaluation has been conducted using three public English datasets and provides a baseline for other researchers. The obtained results identify the combination of FastText embeddings with a selection of lexicon-based features as the best-performing model, achieving F-scores above 80%.  相似文献   

12.
As a hot spot these years, cross-domain sentiment classification aims to learn a reliable classifier using labeled data from a source domain and evaluate the classifier on a target domain. In this vein, most approaches utilized domain adaptation that maps data from different domains into a common feature space. To further improve the model performance, several methods targeted to mine domain-specific information were proposed. However, most of them only utilized a limited part of domain-specific information. In this study, we first develop a method of extracting domain-specific words based on the topic information derived from topic models. Then, we propose a Topic Driven Adaptive Network (TDAN) for cross-domain sentiment classification. The network consists of two sub-networks: a semantics attention network and a domain-specific word attention network, the structures of which are based on transformers. These sub-networks take different forms of input and their outputs are fused as the feature vector. Experiments validate the effectiveness of our TDAN on sentiment classification across domains. Case studies also indicate that topic models have the potential to add value to cross-domain sentiment classification by discovering interpretable and low-dimensional subspaces.  相似文献   

13.
Financial decisions are often based on classification models which are used to assign a set of observations into predefined groups. Different data classification models were developed to foresee the financial crisis of an organization using their historical data. One important step towards the development of accurate financial crisis prediction (FCP) model involves the selection of appropriate variables (features) which are relevant for the problems at hand. This is termed as feature selection problem which helps to improve the classification performance. This paper proposes an Ant Colony Optimization (ACO) based financial crisis prediction (FCP) model which incorporates two phases: ACO based feature selection (ACO-FS) algorithm and ACO based data classification (ACO-DC) algorithm. The proposed ACO-FCP model is validated using a set of five benchmark dataset includes both qualitative and quantitative. For feature selection design, the developed ACO-FS method is compared with three existing feature selection algorithms namely genetic algorithm (GA), Particle Swarm Optimization (PSO) algorithm and Grey Wolf Optimization (GWO) algorithm. In addition, a comparison of classification results is also made between ACO-DC and state of art methods. Experimental analysis shows that the ACO-FCP ensemble model is superior and more robust than its counterparts. In consequence, this study strongly recommends that the proposed ACO-FCP model is highly competitive than traditional and other artificial intelligence techniques.  相似文献   

14.
Misinformation on social media is a nonnegligible phenomenon that causes successive adverse impacts. Numerous scholarly efforts have been devoted to automatic misinformation detection to address this problem. The effective feature is the key to achieving high identification performance. However, the effectiveness of the feature may change in different issues and time considering the manifold social contextual reasons. Most extant literature on misinformation detection does not differentiate between topics, issues or domains. Although some research compares detection across domains, they concentrate on the model's overall performance, neglecting the effectiveness of individual features. Furthermore, the comparison studies mainly incorporate single-domain issues rather than issues that cover multiple domains. It is still difficult to determine which domain's misinformation characteristics will match those of multi-dimensional issues. Since the misinformation nowadays covers multiple domains, finding robust features in misinformation detection over issues and time is an urgent research agenda. In this study, we collected datasets of two issues, climate change and genetically modified organisms (GMOs), between January 1st, 2010 and December 31st, 2020 on Weibo, manually annotated the veracity status of the posts, and compared the performance of the proposed features in identifying misinformation by applying logistic regression. The results demonstrate that (1) the predicting power of content-based features, including topic and sentiment, is relatively robust compared to user-based and propagation-based features across issues and time. (2) The feature effectiveness varied at different time points. Our findings imply that future research could consider focusing more on content-based features, especially implicit features from the content in misinformation detection. Moreover, researchers should evaluate the feature effectiveness at different time stages to improve the efficiency of misinformation detection.  相似文献   

15.
Ideation is an important phase in the new product development process at which product designers innovate and select novel ideas that can be added as features to an existing product. One way to find novel ideas is to transfer uncommon features of products of other domains and integrate them into the product to be improved. However, before incorporating such targeted features into the product, they need to be evaluated against the customers’ acceptance in social media using sentiment aggregation tools. Despite the many studies in sentiment analysis, mapping the customers’ opinions towards both high-level and technical features of a product extracted from social media to their best corresponding component in that product is still a challenge. Furthermore, none of the existing approaches ascertains the sentiment value of a targeted feature by capturing its dependencies on other features. In this paper, to address these drawbacks, we propose the sentiment aggregation framework for targeted features (SA-TF). SA-TF determines the sentiment of a targeted feature by assisting product designers in the tasks of mapping the features discussed in the reviews to the right product components, sentiment aggregation and considering feature dependencies to determine their polarity. The superiority of the different phases of SA-TF is demonstrated with experiments and comparing it with an existing approach.  相似文献   

16.
Multimodal fake news detection methods based on semantic information have achieved great success. However, these methods only exploit the deep features of multimodal information, which leads to a large loss of valid information at the shallow level. To address this problem, we propose a progressive fusion network (MPFN) for multimodal disinformation detection, which captures the representational information of each modality at different levels and achieves fusion between modalities at the same level and at different levels by means of a mixer to establish a strong connection between the modalities. Specifically, we use a transformer structure, which is effective in computer vision tasks, as a visual feature extractor to gradually sample features at different levels and combine features obtained from a text feature extractor and image frequency domain information at different levels for fine-grained modeling. In addition, we design a feature fusion approach to better establish connections between modalities, which can further improve the performance and thus surpass other network structures in the literature. We conducted extensive experiments on two real datasets, Weibo and Twitter, where our method achieved 83.3% accuracy on the Twitter dataset, which has increased by at least 4.3% compared to other state-of-the-art methods. This demonstrates the effectiveness of MPFN for identifying fake news, and the method reaches a relatively advanced level by combining different levels of information from each modality and a powerful modality fusion method.  相似文献   

17.
[目的/意义] 提出一种基于在线产品评论的竞争情报挖掘框架,为企业改进产品设计和制定竞争策略提供参考。[方法/过程] 利用Word2vec技术构建产品特征词集合,识别用户评论主题特征。然后使用情感分析方法对评论文本进行分类,得到特征维度的评论情感。最后从产品主题特征和情感态度特征两方面进行数据分析,并以可视化结果呈现。[结果/结论] 以汽车行业的评论数据为例进行实验,结果表明该方法能够有效提取产品情报信息,帮助企业有效识别自身品牌及竞争对手的优势和劣势,为大数据环境下的竞争情报挖掘提供方法指导。  相似文献   

18.
This article describes in-depth research on machine learning methods for sentiment analysis of Czech social media. Whereas in English, Chinese, or Spanish this field has a long history and evaluation datasets for various domains are widely available, in the case of the Czech language no systematic research has yet been conducted. We tackle this issue and establish a common ground for further research by providing a large human-annotated Czech social media corpus. Furthermore, we evaluate state-of-the-art supervised machine learning methods for sentiment analysis. We explore different pre-processing techniques and employ various features and classifiers. We also experiment with five different feature selection algorithms and investigate the influence of named entity recognition and preprocessing on sentiment classification performance. Moreover, in addition to our newly created social media dataset, we also report results for other popular domains, such as movie and product reviews. We believe that this article will not only extend the current sentiment analysis research to another family of languages, but will also encourage competition, potentially leading to the production of high-end commercial solutions.  相似文献   

19.
Irony as a literary technique is widely used in online texts such as Twitter posts. Accurate irony detection is crucial for tasks such as effective sentiment analysis. A text’s ironic intent is defined by its context incongruity. For example in the phrase “I love being ignored”, the irony is defined by the incongruity between the positive word “love” and the negative context of “being ignored”. Existing studies mostly formulate irony detection as a standard supervised learning text categorization task, relying on explicit expressions for detecting context incongruity. In this paper we formulate irony detection instead as a transfer learning task where supervised learning on irony labeled text is enriched with knowledge transferred from external sentiment analysis resources. Importantly, we focus on identifying the hidden, implicit incongruity without relying on explicit incongruity expressions, as in “I like to think of myself as a broken down Justin Bieber – my philosophy professor.” We propose three transfer learning-based approaches to using sentiment knowledge to improve the attention mechanism of recurrent neural models for capturing hidden patterns for incongruity. Our main findings are: (1) Using sentiment knowledge from external resources is a very effective approach to improving irony detection; (2) For detecting implicit incongruity, transferring deep sentiment features seems to be the most effective way. Experiments show that our proposed models outperform state-of-the-art neural models for irony detection.  相似文献   

20.
Detecting collusive spammers who collaboratively post fake reviews is extremely important to guarantee the reliability of review information on e-commerce platforms. In this research, we formulate the collusive spammer detection as an anomaly detection problem and propose a novel detection approach based on heterogeneous graph attention network. First, we analyze the review dataset from different perspectives and use the statistical distribution to model each user's review behavior. By introducing the Bhattacharyya distance, we calculate the user-user and product-product correlation degrees to construct a multi-relation heterogeneous graph. Second, we combine the biased random walk strategy and multi-head self-attention mechanism to propose a model of heterogeneous graph attention network to learn the node embeddings from the multi-relation heterogeneous graph. Finally, we propose an improved community detection algorithm to acquire candidate spamming groups and employ an anomaly detection model based on the autoencoder to identify collusive spammers. Experiments show that the average improvements of precision@k and recall@k of the proposed approach over the best baseline method on the Amazon, Yelp_Miami, Yelp_New York, Yelp_San Francisco, and YelpChi datasets are [13%, 3%], [32%, 12%], [37%, 7%], [42%, 10%], and [18%, 1%], respectively.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号