首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Recently, the Transformer model architecture and the pre-trained Transformer-based language models have shown impressive performance when used in solving both natural language understanding and text generation tasks. Nevertheless, there is little research done on using these models for text generation in Arabic. This research aims at leveraging and comparing the performance of different model architectures, including RNN-based and Transformer-based ones, and different pre-trained language models, including mBERT, AraBERT, AraGPT2, and AraT5 for Arabic abstractive summarization. We first built an Arabic summarization dataset of 84,764 high-quality text-summary pairs. To use mBERT and AraBERT in the context of text summarization, we employed a BERT2BERT-based encoder-decoder model where we initialized both the encoder and decoder with the respective model weights. The proposed models have been tested using ROUGE metrics and manual human evaluation. We also compared their performance on out-of-domain data. Our pre-trained Transformer-based models give a large improvement in performance with ~79% less data. We found that AraT5 scores ~3 ROUGE higher than a BERT2BERT-based model that is initialized with AraBERT, indicating that an encoder-decoder pre-trained Transformer is more suitable for summarizing Arabic text. Also, both of these two models perform better than AraGPT2 by a clear margin, which we found to produce summaries with high readability but with relatively lesser quality. On the other hand, we found that both AraT5 and AraGPT2 are better at summarizing out-of-domain text. We released our models and dataset publicly1,.2  相似文献   

2.
The high quality evaluation of generated summaries is needed if we are to improve automatic summarization systems. Although human evaluation provides better results than automatic evaluation methods, its cost is huge and it is difficult to reproduce the results. Therefore, we need an automatic method that simulates human evaluation if we are to improve our summarization system efficiently. Although automatic evaluation methods have been proposed, they are unreliable when used for individual summaries. To solve this problem, we propose a supervised automatic evaluation method based on a new regression model called the voted regression model (VRM). VRM has two characteristics: (1) model selection based on ‘corrected AIC’ to avoid multicollinearity, (2) voting by the selected models to alleviate the problem of overfitting. Evaluation results obtained for TSC3 and DUC2004 show that our method achieved error reductions of about 17–51% compared with conventional automatic evaluation methods. Moreover, our method obtained the highest correlation coefficients in several different experiments.  相似文献   

3.
Extractive summarization for academic articles in natural sciences and medicine has attracted attention for a long time. However, most existing extractive summarization models often process academic articles with sentence classification models, which are hard to produce comprehensive summaries. To address this issue, we explore a new view to solve the extractive summarization of academic articles in natural sciences and medicine by taking it as a question-answering process. We propose a novel framework, MRC-Sum, where the extractive summarization for academic articles in natural sciences and medicine is cast as an MRC (Machine Reading Comprehension) task. To instantiate MRC-Sum, article-summary pairs in the summarization datasets are firstly reconstructed into (Question, Answer, Context) triples in the MRC task. Several questions are designed to cover the main aspects (e.g. Background, Method, Result, Conclusion) of the articles in natural sciences and medicine. A novel strategy is proposed to solve the problem of the non-existence of the ground truth answer spans. Then MRC-Sum is trained on the reconstructed datasets and large-scale pre-trained models. During the inference stage, four answer spans of the predefined questions are given by MRC-Sum and concatenated to form the final summary for each article. Experiments on three publicly available benchmarks, i.e., the Covid, PubMed, and arXiv datasets, demonstrate the effectiveness of MRC-Sum. Specifically, MRC-Sum outperforms advanced extractive summarization baselines on the Covid dataset and achieves competitive results on the PubMed and arXiv datasets. We also propose a novel metric, COMPREHS, to automatically evaluate the comprehensiveness of the system summaries for academic articles in natural sciences and medicine. Abundant experiments are conducted and verified the reliability of the proposed metric. And the results of the COMPREHS metric show that MRC-Sum is able to generate more comprehensive summaries than the baseline models.  相似文献   

4.
Sentiment analysis concerns the study of opinions expressed in a text. Due to the huge amount of reviews, sentiment analysis plays a basic role to extract significant information and overall sentiment orientation of reviews. In this paper, we present a deep-learning-based method to classify a user's opinion expressed in reviews (called RNSA).To the best of our knowledge, a deep learning-based method in which a unified feature set which is representative of word embedding, sentiment knowledge, sentiment shifter rules, statistical and linguistic knowledge, has not been thoroughly studied for a sentiment analysis. The RNSA employs the Recurrent Neural Network (RNN) which is composed by Long Short-Term Memory (LSTM) to take advantage of sequential processing and overcome several flaws in traditional methods, where order and information about the word are vanished. Furthermore, it uses sentiment knowledge, sentiment shifter rules and multiple strategies to overcome the following drawbacks: words with similar semantic context but opposite sentiment polarity; contextual polarity; sentence types; word coverage limit of an individual lexicon; word sense variations. To verify the effectiveness of our work, we conduct sentence-level sentiment classification on large-scale review datasets. We obtained encouraging result. Experimental results show that (1) feature vectors in terms of (a) statistical, linguistic and sentiment knowledge, (b) sentiment shifter rules and (c) word-embedding can improve the classification accuracy of sentence-level sentiment analysis; (2) our method that learns from this unified feature set can obtain significant performance than one that learns from a feature subset; (3) our neural model yields superior performance improvements in comparison with other well-known approaches in the literature.  相似文献   

5.
Quickly and accurately summarizing representative opinions is a key step for assessing microblog sentiments. The Ortony-Clore-Collins (OCC) model of emotion can offer a rule-based emotion export mechanism. In this paper, we propose an OCC model and a Convolutional Neural Network (CNN) based opinion summarization method for Chinese microblogging systems. We test the proposed method using real world microblog data. We then compare the accuracy of manual sentiment annotation to the accuracy using our OCC-based sentiment classification rule library. Experimental results from analyzing three real-world microblog datasets demonstrate the efficacy of our proposed method. Our study highlights the potential of combining emotion cognition with deep learning in sentiment analysis of social media data.  相似文献   

6.
In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.  相似文献   

7.
Multi-Document Summarization of Scientific articles (MDSS) is a challenging task that aims to generate concise and informative summaries for multiple scientific articles on a particular topic. However, despite recent advances in abstractive models for MDSS, grammatical correctness and contextual coherence remain challenging issues. In this paper, we introduce EDITSum, a novel abstractive MDSS model that leverages sentence-level planning to guide summary generation. Our model incorporates neural topic model information as explicit guidance and sequential latent variables information as implicit guidance under a variational framework. We propose a hierarchical decoding strategy that generates the sentence-level planning by a sentence decoder and then generates the final summary conditioned on the planning by a word decoder. Experimental results show that our model outperforms previous state-of-the-art models by a significant margin on ROUGE-1 and ROUGE-L metrics. Ablation studies demonstrate the effectiveness of the individual modules proposed in our model, and human evaluations provide strong evidence that our model generates more coherent and error-free summaries. Our work highlights the importance of high-level planning in addressing intra-sentence errors and inter-sentence incoherence issues in MDSS.  相似文献   

8.
Sentiment analysis on Twitter has attracted much attention recently due to its wide applications in both, commercial and public sectors. In this paper we present SentiCircles, a lexicon-based approach for sentiment analysis on Twitter. Different from typical lexicon-based approaches, which offer a fixed and static prior sentiment polarities of words regardless of their context, SentiCircles takes into account the co-occurrence patterns of words in different contexts in tweets to capture their semantics and update their pre-assigned strength and polarity in sentiment lexicons accordingly. Our approach allows for the detection of sentiment at both entity-level and tweet-level. We evaluate our proposed approach on three Twitter datasets using three different sentiment lexicons to derive word prior sentiments. Results show that our approach significantly outperforms the baselines in accuracy and F-measure for entity-level subjectivity (neutral vs. polar) and polarity (positive vs. negative) detections. For tweet-level sentiment detection, our approach performs better than the state-of-the-art SentiStrength by 4–5% in accuracy in two datasets, but falls marginally behind by 1% in F-measure in the third dataset.  相似文献   

9.
Centroid-based summarization of multiple documents   总被引:2,自引:0,他引:2  
We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.  相似文献   

10.
General graph random walk has been successfully applied in multi-document summarization, but it has some limitations to process documents by this way. In this paper, we propose a novel hypergraph based vertex-reinforced random walk framework for multi-document summarization. The framework first exploits the Hierarchical Dirichlet Process (HDP) topic model to learn a word-topic probability distribution in sentences. Then the hypergraph is used to capture both cluster relationship based on the word-topic probability distribution and pairwise similarity among sentences. Finally, a time-variant random walk algorithm for hypergraphs is developed to rank sentences which ensures sentence diversity by vertex-reinforcement in summaries. Experimental results on the public available dataset demonstrate the effectiveness of our framework.  相似文献   

11.
The use of domain-specific concepts in biomedical text summarization   总被引:3,自引:0,他引:3  
Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician’s evaluation of three randomly-selected papers from an evaluation corpus to show that the author’s abstract does not always reflect the entire contents of the full-text.  相似文献   

12.
Structured sentiment analysis is a newly proposed task, which aims to summarize the overall sentiment and opinion status on given texts, i.e., the opinion expression, the sentiment polarity of the opinion, the holder of the opinion, and the target the opinion towards. In this work, we investigate a transition-based model for end-to-end structured sentiment analysis task. We design a transition architecture which supports the recognition of all the possible opinion quadruples in one shot. Based on the transition backbone, we then propose a Dual-Pointer module for more accurate term boundary detection. Besides, we further introduce a global graph reasoning mechanism, which helps to learn the global-level interactions between the overlapped quadruples. The high-order features are navigated into the transition system to enhance the final predictions. Extensive experimental results on five benchmarks demonstrate both the prominent efficacy and efficiency of our system. Our model outperforms all baselines in terms of all metrics, especially achieving a 10.5% point gain over the current best-performing system only detecting the holder-target-opinion triplets. Further analyses reveal that our framework is also effective in solving the overlapping structure and long-range dependency issues.  相似文献   

13.
We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of the ROUGEeval package.  相似文献   

14.
Although statistical learning methods have achieved success in e-commerce platform product review sentiment classification, two problems have limited its practical application: 1) The computational efficiency to process large-scale reviews; 2) the ability to continuously learn from increasing reviews and multiple domains. This paper presents a continuous naïve Bayes learning framework for large-scale and multi-domain e-commerce platform product review sentiment classification. While keeping the high computational efficiency of the traditional naïve Bayes model, we extend the parameter estimation mechanism in naïve Bayes to a continuous learning style. We furthermore propose ways to fine-tune the learned distribution based on three kinds of assumptions to better adapt to different domains. Experimental results on the Amazon product and movie review sentiment datasets show that our model can use the knowledge learned from past domains to guide learning in new domains, and has a better capacity of dealing with reviews that are continuously updated and come from different domains.  相似文献   

15.
A critical challenge for Web search engines concerns how they present relevant results to searchers. The traditional approach is to produce a ranked list of results with title and summary (snippet) information, and these snippets are usually chosen based on the current query. Snippets play a vital sensemaking role, helping searchers to efficiently make sense of a collection of search results, as well as determine the likely relevance of individual results. Recently researchers have begun to explore how snippets might also be adapted based on searcher preferences as a way to better highlight relevant results to the searcher. In this paper we focus on the role of snippets in collaborative web search and describe a technique for summarizing search results that harnesses the collaborative search behaviour of communities of like-minded searchers to produce snippets that are more focused on the preferences of the searchers. We go on to show how this so-called social summarization technique can generate summaries that are significantly better adapted to searcher preferences and describe a novel personalized search interface that combines result recommendation with social summarization.  相似文献   

16.
A well-known challenge for multi-document summarization (MDS) is that a single best or “gold standard” summary does not exist, i.e. it is often difficult to secure a consensus among reference summaries written by different authors. It therefore motivates us to study what the “important information” is in multiple input documents that will guide different authors in writing a summary. In this paper, we propose the notions of macro- and micro-level information. Macro-level information refers to the salient topics shared among different input documents, while micro-level information consists of different sentences that act as elaborating or provide complementary details for those salient topics. Experimental studies were conducted to examine the influence of macro- and micro-level information on summarization and its evaluation. Results showed that human subjects highly relied on macro-level information when writing a summary. The length allowed for summaries is the leading factor that affects the summary agreement. Meanwhile, our summarization evaluation approach based on the proposed macro- and micro-structure information also suggested that micro-level information offered complementary details for macro-level information. We believe that both levels of information form the “important information” which affects the modeling and evaluation of automatic summarization systems.  相似文献   

17.
Stance is defined as the expression of a speaker's standpoint towards a given target or entity. To date, the most reliable method for measuring stance is opinion surveys. However, people's increased reliance on social media makes these online platforms an essential source of complementary information about public opinion. Our study contributes to the discussion surrounding replicable methods through which to conduct reliable stance detection by establishing a rule-based model, which we replicated for several targets independently. To test our model, we relied on a widely used dataset of annotated tweets - the SemEval Task 6A dataset, which contains 5 targets with 4,163 manually labelled tweets. We relied on “off-the-shelf” sentiment lexica to expand the scope of our custom dictionaries, while also integrating linguistic markers and using word-pairs dependency information to conduct stance classification. While positive and negative evaluative words are the clearest markers of expression of stance, we demonstrate the added value of linguistic markers to identify the direction of the stance more precisely. Our model achieves an average classification accuracy of 75% (ranging from 67% to 89% across targets). This study is concluded by discussing practical implications and outlooks for future research, while highlighting that each target poses specific challenges to stance detection.  相似文献   

18.
We propose a topic-dependent attention model for sentiment classification and topic extraction. Our model assumes that a global topic embedding is shared across documents and employs an attention mechanism to derive local topic embedding for words and sentences. These are subsequently incorporated in a modified Gated Recurrent Unit (GRU) for sentiment classification and extraction of topics bearing different sentiment polarities. Those topics emerge from the words’ local topic embeddings learned by the internal attention of the GRU cells in the context of a multi-task learning framework. In this paper, we present the hierarchical architecture, the new GRU unit and the experiments conducted on users’ reviews which demonstrate classification performance on a par with the state-of-the-art methodologies for sentiment classification and topic coherence outperforming the current approaches for supervised topic extraction. In addition, our model is able to extract coherent aspect-sentiment clusters despite using no aspect-level annotations for training.  相似文献   

19.
Existing methods for text generation usually fed the overall sentiment polarity of a product as an input into the seq2seq model to generate a relatively fluent review. However, these methods cannot express more fine-grained sentiment polarity. Although some studies attempt to generate aspect-level sentiment controllable reviews, the personalized attribute of reviews would be ignored. In this paper, a hierarchical template-transformer model is proposed for personalized fine-grained sentiment controllable generation, which aims to generate aspect-level sentiment controllable reviews with personalized information. The hierarchical structure can effectively learn sentiment information and lexical information separately. The template transformer uses a part of speech (POS) template to guide the generation process and generate a smoother review. To verify our model, we used the existing model to obtain a corpus named FSCG-80 from Yelp, which contains 800K samples and conducted a series of experiments on this corpus. Experimental results show that our model can achieve up to 89.93% aspect-sentiment control accuracy and generate more fluent reviews.  相似文献   

20.
Aspect mining, which aims to extract ad hoc aspects from online reviews and predict rating or opinion on each aspect, can satisfy the personalized needs for evaluation of specific aspect on product quality. Recently, with the increase of related research, how to effectively integrate rating and review information has become the key issue for addressing this problem. Considering that matrix factorization is an effective tool for rating prediction and topic modeling is widely used for review processing, it is a natural idea to combine matrix factorization and topic modeling for aspect mining (or called aspect rating prediction). However, this idea faces several challenges on how to address suitable sharing factors, scale mismatch, and dependency relation of rating and review information. In this paper, we propose a novel model to effectively integrate Matrix factorization and Topic modeling for Aspect rating prediction (MaToAsp). To overcome the above challenges and ensure the performance, MaToAsp employs items as the sharing factors to combine matrix factorization and topic modeling, and introduces an interpretive preference probability to eliminate scale mismatch. In the hybrid model, we establish a dependency relation from ratings to sentiment terms in phrases. The experiments on two real datasets including Chinese Dianping and English Tripadvisor prove that MaToAsp not only obtains reasonable aspect identification but also achieves the best aspect rating prediction performance, compared to recent representative baselines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号