首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
We show some limitations of the ROUGE evaluation method for automatic summarization. We present a method for automatic summarization based on a Markov model of the source text. By a simple greedy word selection strategy, summaries with high ROUGE-scores are generated. These summaries would however not be considered good by human readers. The method can be adapted to trick different settings of the ROUGEeval package.  相似文献   

2.
In recent years, there has been increased interest in topic-focused multi-document summarization. In this task, automatic summaries are produced in response to a specific information request, or topic, stated by the user. The system we have designed to accomplish this task comprises four main components: a generic extractive summarization system, a topic-focusing component, sentence simplification, and lexical expansion of topic words. This paper details each of these components, together with experiments designed to quantify their individual contributions. We include an analysis of our results on two large datasets commonly used to evaluate task-focused summarization, the DUC2005 and DUC2006 datasets, using automatic metrics. Additionally, we include an analysis of our results on the DUC2006 task according to human evaluation metrics. In the human evaluation of system summaries compared to human summaries, i.e., the Pyramid method, our system ranked first out of 22 systems in terms of overall mean Pyramid score; and in the human evaluation of summary responsiveness to the topic, our system ranked third out of 35 systems.  相似文献   

3.
The use of domain-specific concepts in biomedical text summarization   总被引:3,自引:0,他引:3  
Text summarization is a method for data reduction. The use of text summarization enables users to reduce the amount of text that must be read while still assimilating the core information. The data reduction offered by text summarization is particularly useful in the biomedical domain, where physicians must continuously find clinical trial study information to incorporate into their patient treatment efforts. Such efforts are often hampered by the high-volume of publications. This paper presents two independent methods (BioChain and FreqDist) for identifying salient sentences in biomedical texts using concepts derived from domain-specific resources. Our semantic-based method (BioChain) is effective at identifying thematic sentences, while our frequency-distribution method (FreqDist) removes information redundancy. The two methods are then combined to form a hybrid method (ChainFreq). An evaluation of each method is performed using the ROUGE system to compare system-generated summaries against a set of manually-generated summaries. The BioChain and FreqDist methods outperform some common summarization systems, while the ChainFreq method improves upon the base approaches. Our work shows that the best performance is achieved when the two methods are combined. The paper also presents a brief physician’s evaluation of three randomly-selected papers from an evaluation corpus to show that the author’s abstract does not always reflect the entire contents of the full-text.  相似文献   

4.
The increasing volume of textual information on any topic requires its compression to allow humans to digest it. This implies detecting the most important information and condensing it. These challenges have led to new developments in the area of Natural Language Processing (NLP) and Information Retrieval (IR) such as narrative summarization and evaluation methodologies for narrative extraction. Despite some progress over recent years with several solutions for information extraction and text summarization, the problems of generating consistent narrative summaries and evaluating them are still unresolved. With regard to evaluation, manual assessment is expensive, subjective and not applicable in real time or to large collections. Moreover, it does not provide re-usable benchmarks. Nevertheless, commonly used metrics for summary evaluation still imply substantial human effort since they require a comparison of candidate summaries with a set of reference summaries. The contributions of this paper are three-fold. First, we provide a comprehensive overview of existing metrics for summary evaluation. We discuss several limitations of existing frameworks for summary evaluation. Second, we introduce an automatic framework for the evaluation of metrics that does not require any human annotation. Finally, we evaluate the existing assessment metrics on a Wikipedia data set and a collection of scientific articles using this framework. Our findings show that the majority of existing metrics based on vocabulary overlap are not suitable for assessment based on comparison with a full text and we discuss this outcome.  相似文献   

5.
Opinion summarization can facilitate user’s decision-making by mining the salient review information. However, due to the lack of sufficient annotated data, most of the early works are based on extractive methods, which restricts the performance of opinion summarization. In this work, we aim to improve the informativeness of opinion summarization to provide better guidance to users. We consider the setting with only reviews without corresponding summaries, and propose an aspect-augmented model for unsupervised abstractive opinion summarization, denoted as AsU-OSum. We first employ an aspect-based sentiment analysis system to extract opinion phrases from reviews. Then, we construct a heterogeneous graph consisting of reviews and opinion clusters as nodes, which is used to enhance the Transformer-based encoder–decoder framework. Furthermore, we design a novel cascaded attention mechanism to prompt the decoder to pay more attention to the aspects that are more likely to appear in summary. During training, we introduce a sentiment accuracy reward that further enhances the learning ability of our model. We conduct comprehensive experiments on the Yelp, Amazon, and Rotten Tomatoes datasets. Automatic evaluation results show that our model is competitive and performs better than the state-of-the-art (SOTA) models on some ROUGE metrics. Human evaluation results further verify that our model can generate more informative summaries and reduce redundancy.  相似文献   

6.
A well-known challenge for multi-document summarization (MDS) is that a single best or “gold standard” summary does not exist, i.e. it is often difficult to secure a consensus among reference summaries written by different authors. It therefore motivates us to study what the “important information” is in multiple input documents that will guide different authors in writing a summary. In this paper, we propose the notions of macro- and micro-level information. Macro-level information refers to the salient topics shared among different input documents, while micro-level information consists of different sentences that act as elaborating or provide complementary details for those salient topics. Experimental studies were conducted to examine the influence of macro- and micro-level information on summarization and its evaluation. Results showed that human subjects highly relied on macro-level information when writing a summary. The length allowed for summaries is the leading factor that affects the summary agreement. Meanwhile, our summarization evaluation approach based on the proposed macro- and micro-structure information also suggested that micro-level information offered complementary details for macro-level information. We believe that both levels of information form the “important information” which affects the modeling and evaluation of automatic summarization systems.  相似文献   

7.
Noise reduction through summarization for Web-page classification   总被引:1,自引:0,他引:1  
Due to a large variety of noisy information embedded in Web pages, Web-page classification is much more difficult than pure-text classification. In this paper, we propose to improve the Web-page classification performance by removing the noise through summarization techniques. We first give empirical evidence that ideal Web-page summaries generated by human editors can indeed improve the performance of Web-page classification algorithms. We then put forward a new Web-page summarization algorithm based on Web-page layout and evaluate it along with several other state-of-the-art text summarization algorithms on the LookSmart Web directory. Experimental results show that the classification algorithms (NB or SVM) augmented by any summarization approach can achieve an improvement by more than 5.0% as compared to pure-text-based classification algorithms. We further introduce an ensemble method to combine the different summarization algorithms. The ensemble summarization method achieves more than 12.0% improvement over pure-text based methods.  相似文献   

8.
Centroid-based summarization of multiple documents   总被引:2,自引:0,他引:2  
We present a multi-document summarizer, MEAD, which generates summaries using cluster centroids produced by a topic detection and tracking system. We describe two new techniques, a centroid-based summarizer, and an evaluation scheme based on sentence utility and subsumption. We have applied this evaluation to both single and multiple document summaries. Finally, we describe two user studies that test our models of multi-document summarization.  相似文献   

9.
Most existing research on applying machine learning techniques to document summarization explores either classification models or learning-to-rank models. This paper presents our recent study on how to apply a different kind of learning models, namely regression models, to query-focused multi-document summarization. We choose to use Support Vector Regression (SVR) to estimate the importance of a sentence in a document set to be summarized through a set of pre-defined features. In order to learn the regression models, we propose several methods to construct the “pseudo” training data by assigning each sentence with a “nearly true” importance score calculated with the human summaries that have been provided for the corresponding document set. A series of evaluations on the DUC data sets are conducted to examine the efficiency and the robustness of the proposed approaches. When compared with classification models and ranking models, regression models are consistently preferable.  相似文献   

10.
A bottom-up approach to sentence ordering for multi-document summarization   总被引:1,自引:0,他引:1  
Ordering information is a difficult but important task for applications generating natural language texts such as multi-document summarization, question answering, and concept-to-text generation. In multi-document summarization, information is selected from a set of source documents. However, improper ordering of information in a summary can confuse the reader and deteriorate the readability of the summary. Therefore, it is vital to properly order the information in multi-document summarization. We present a bottom-up approach to arrange sentences extracted for multi-document summarization. To capture the association and order of two textual segments (e.g. sentences), we define four criteria: chronology, topical-closeness, precedence, and succession. These criteria are integrated into a criterion by a supervised learning approach. We repeatedly concatenate two textual segments into one segment based on the criterion, until we obtain the overall segment with all sentences arranged. We evaluate the sentence orderings produced by the proposed method and numerous baselines using subjective gradings as well as automatic evaluation measures. We introduce the average continuity, an automatic evaluation measure of sentence ordering in a summary, and investigate its appropriateness for this task.  相似文献   

11.
Weighted consensus multi-document summarization   总被引:1,自引:0,他引:1  
Multi-document summarization is a fundamental tool for document understanding and has received much attention recently. Given a collection of documents, a variety of summarization methods based on different strategies have been proposed to extract the most important sentences from the original documents. However, very few studies have been reported on aggregating different summarization methods to possibly generate better summary results. In this paper, we propose a weighted consensus summarization method to combine the results from single summarization systems. We evaluate and compare our proposed weighted consensus method with various baseline combination methods. Experimental results on DUC2002 and DUC2004 data sets demonstrate the performance improvement by aggregating multiple summarization systems, and our proposed weighted consensus summarization method outperforms other combination methods.  相似文献   

12.
Text summarization is a process of generating a brief version of documents by preserving the fundamental information of documents as much as possible. Although most of the text summarization research has been focused on supervised learning solutions, there are a few datasets indeed generated for summarization tasks, and most of the existing summarization datasets do not have human-generated goal summaries which are vital for both summary generation and evaluation. Therefore, a new dataset was presented for abstractive and extractive summarization tasks in this study. This dataset contains academic publications, the abstracts written by the authors, and extracts in two sizes, which were generated by human readers in this research. Then, the resulting extracts were evaluated to ensure the validity of the human extract production process. Moreover, the extractive summarization problem was reinvestigated on the proposed summarization dataset. Here the main point taken into account was to analyze the feature vector to generate more informative summaries. To that end, a comprehensive syntactic feature space was generated for the proposed dataset, and the impact of these features on the informativeness of the resulting summary was investigated. Besides, the summarization capability of semantic features was experienced by using GloVe and word2vec embeddings. Finally, the use of ensembled feature space, which corresponds to the joint use of syntactic and semantic features, was proposed on a long short-term memory-based neural network model. ROUGE metrics evaluated the model summaries, and the results of these evaluations showed that the use of the proposed ensemble feature space remarkably improved the single-use of syntactic or semantic features. Additionally, the resulting summaries of the proposed approach on ensembled features prominently outperformed or provided comparable performance than summaries obtained by state-of-the-art models for extractive summarization.  相似文献   

13.
Most knowledge accumulated through scientific discoveries in genomics and related biomedical disciplines is buried in the vast amount of biomedical literature. Since understanding gene regulations is fundamental to biomedical research, summarizing all the existing knowledge about a gene based on literature is highly desirable to help biologists digest the literature. In this paper, we present a study of methods for automatically generating gene summaries from biomedical literature. Unlike most existing work on automatic text summarization, in which the generated summary is often a list of extracted sentences, we propose to generate a semi-structured summary which consists of sentences covering specific semantic aspects of a gene. Such a semi-structured summary is more appropriate for describing genes and poses special challenges for automatic text summarization. We propose a two-stage approach to generate such a summary for a given gene – first retrieving articles about a gene and then extracting sentences for each specified semantic aspect. We address the issue of gene name variation in the first stage and propose several different methods for sentence extraction in the second stage. We evaluate the proposed methods using a test set with 20 genes. Experiment results show that the proposed methods can generate useful semi-structured gene summaries automatically from biomedical literature, and our proposed methods outperform general purpose summarization methods. Among all the proposed methods for sentence extraction, a probabilistic language modeling approach that models gene context performs the best.  相似文献   

14.
We present two approaches to email thread summarization: collective message summarization (CMS) applies a multi-document summarization approach, while individual message summarization (IMS) treats the problem as a sequence of single-document summarization tasks. Both approaches are implemented in our general framework driven by sentence compression. Instead of a purely extractive approach, we employ linguistic and statistical methods to generate multiple compressions, and then select from those candidates to produce a final summary. We demonstrate these ideas on the Enron email collection – a very challenging corpus because of the highly technical language. Experimental results point to two findings: that CMS represents a better approach to email thread summarization, and that current sentence compression techniques do not improve summarization performance in this genre.  相似文献   

15.
In existing unsupervised methods, Latent Semantic Analysis (LSA) is used for sentence selection. However, the obtained results are less meaningful, because singular vectors are used as the bases for sentence selection from given documents, and singular vector components can have negative values. We propose a new unsupervised method using Non-negative Matrix Factorization (NMF) to select sentences for automatic generic document summarization. The proposed method uses non-negative constraints, which are more similar to the human cognition process. As a result, the method selects more meaningful sentences for generic document summarization than those selected using LSA.  相似文献   

16.
DUC in context   总被引:1,自引:0,他引:1  
Recent years have seen increased interest in text summarization with emphasis on evaluation of prototype systems. Many factors can affect the design of such evaluations, requiring choices among competing alternatives. This paper examines several major themes running through three evaluations: SUMMAC, NTCIR, and DUC, with a concentration on DUC. The themes are extrinsic and intrinsic evaluation, evaluation procedures and methods, generic versus focused summaries, single- and multi-document summaries, length and compression issues, extracts versus abstracts, and issues with genre.  相似文献   

17.
With the advent of Web 2.0, there exist many online platforms that results in massive textual data production such as social networks, online blogs, magazines etc. This textual data carries information that can be used for betterment of humanity. Hence, there is a dire need to extract potential information out of it. This study aims to present an overview of approaches that can be applied to extract and later present these valuable information nuggets residing within text in brief, clear and concise way. In this regard, two major tasks of automatic keyword extraction and text summarization are being reviewed. To compile the literature, scientific articles were collected using major digital computing research repositories. In the light of acquired literature, survey study covers early approaches up to all the way till recent advancements using machine learning solutions. Survey findings conclude that annotated benchmark datasets for various textual data-generators such as twitter and social forms are not available. This scarcity of dataset has resulted into relatively less progress in many domains. Also, applications of deep learning techniques for the task of automatic keyword extraction are relatively unaddressed. Hence, impact of various deep architectures stands as an open research direction. For text summarization task, deep learning techniques are applied after advent of word vectors, and are currently governing state-of-the-art for abstractive summarization. Currently, one of the major challenges in these tasks is semantic aware evaluation of generated results.  相似文献   

18.
Sentiment analysis concerns the study of opinions expressed in a text. This paper presents the QMOS method, which employs a combination of sentiment analysis and summarization approaches. It is a lexicon-based method to query-based multi-documents summarization of opinion expressed in reviews.QMOS combines multiple sentiment dictionaries to improve word coverage limit of the individual lexicon. A major problem for a dictionary-based approach is the semantic gap between the prior polarity of a word presented by a lexicon and the word polarity in a specific context. This is due to the fact that, the polarity of a word depends on the context in which it is being used. Furthermore, the type of a sentence can also affect the performance of a sentiment analysis approach. Therefore, to tackle the aforementioned challenges, QMOS integrates multiple strategies to adjust word prior sentiment orientation while also considers the type of sentence. QMOS also employs the Semantic Sentiment Approach to determine the sentiment score of a word if it is not included in a sentiment lexicon.On the other hand, the most of the existing methods fail to distinguish the meaning of a review sentence and user's query when both of them share the similar bag-of-words; hence there is often a conflict between the extracted opinionated sentences and users’ needs. However, the summarization phase of QMOS is able to avoid extracting a review sentence whose similarity with the user's query is high but whose meaning is different. The method also employs the greedy algorithm and query expansion approach to reduce redundancy and bridge the lexical gaps for similar contexts that are expressed using different wording, respectively. Our experiment shows that the QMOS method can significantly improve the performance and make QMOS comparable to other existing methods.  相似文献   

19.
Abstractive summarization aims to generate a concise summary covering salient content from single or multiple text documents. Many recent abstractive summarization methods are built on the transformer model to capture long-range dependencies in the input text and achieve parallelization. In the transformer encoder, calculating attention weights is a crucial step for encoding input documents. Input documents usually contain some key phrases conveying salient information, and it is important to encode these phrases completely. However, existing transformer-based summarization works did not consider key phrases in input when determining attention weights. Consequently, some of the tokens within key phrases only receive small attention weights, which is not conducive to encoding the semantic information of input documents. In this paper, we introduce some prior knowledge of key phrases into the transformer-based summarization model and guide the model to encode key phrases. For the contextual representation of each token in the key phrase, we assume the tokens within the same key phrase make larger contributions compared with other tokens in the input sequence. Based on this assumption, we propose the Key Phrase Aware Transformer (KPAT), a model with the highlighting mechanism in the encoder to assign greater attention weights for tokens within key phrases. Specifically, we first extract key phrases from the input document and score the phrases’ importance. Then we build the block diagonal highlighting matrix to indicate these phrases’ importance scores and positions. To combine self-attention weights with key phrases’ importance scores, we design two structures of highlighting attention for each head and the multi-head highlighting attention. Experimental results on two datasets (Multi-News and PubMed) from different summarization tasks and domains show that our KPAT model significantly outperforms advanced summarization baselines. We conduct more experiments to analyze the impact of each part of our model on the summarization performance and verify the effectiveness of our proposed highlighting mechanism.  相似文献   

20.
The evolution of the job market has resulted in traditional methods of recruitment becoming insufficient. As it is now necessary to handle volumes of information (mostly in the form of free text) that are impossible to process manually, an analysis and assisted categorization are essential to address this issue. In this paper, we present a combination of the E-Gen and Cortex systems. E-Gen aims to perform analysis and categorization of job offers together with the responses given by the candidates. E-Gen system strategy is based on vectorial and probabilistic models to solve the problem of profiling applications according to a specific job offer. Cortex is a statistical automatic summarization system. In this work, E-Gen uses Cortex as a powerful filter to eliminate irrelevant information contained in candidate answers. Our main objective is to develop a system to assist a recruitment consultant and the results obtained by the proposed combination surpass those of E-Gen in standalone mode on this task.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号