期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Aspect sentiment analysis with heterogeneous graph neural networks

《Information processing & management》2022,59(4):102953

Aspect-based sentiment analysis technologies may be a very practical methodology for securities trading, commodity sales, movie rating websites, etc. Most recent studies adopt the recurrent neural network or attention-based neural network methods to infer aspect sentiment using opinion context terms and sentence dependency trees. However, due to a sentence often having multiple aspects sentiment representation, these models are hard to achieve satisfactory classification results. In this paper, we discuss these problems by encoding sentence syntax tree, words relations and opinion dictionary information in a unified framework. We called this method heterogeneous graph neural networks (Hete_GNNs). Firstly, we adopt the interactive aspect words and contexts to encode the sentence sequence information for parameter sharing. Then, we utilized a novel heterogeneous graph neural network for encoding these sentences’ syntax dependency tree, prior sentiment dictionary, and some part-of-speech tagging information for sentiment prediction. We perform the Hete_GNNs sentiment judgment and report the experiments on five domain datasets, and the results confirm that the heterogeneous context information can be better captured with heterogeneous graph neural networks. The improvement of the proposed method is demonstrated by aspect sentiment classification task comparison. 相似文献

2.

Automatically building templates for entity summary construction

Peng Li Yinglin Wang Jing Jiang 《Information processing & management》2013

In this paper, we propose a novel approach to automatic generation of summary templates from given collections of summary articles. We first develop an entity-aspect LDA model to simultaneously cluster both sentences and words into aspects. We then apply frequent subtree pattern mining on the dependency parse trees of the clustered and labeled sentences to discover sentence patterns that well represent the aspects. Finally, we use the generated templates to construct summaries for new entities. Key features of our method include automatic grouping of semantically related sentence patterns and automatic identification of template slots that need to be filled in. Also, we implement a new sentence compression algorithm which use dependency tree instead of parser tree. We apply our method on five Wikipedia entity categories and compare our method with three baseline methods. Both quantitative evaluation based on human judgment and qualitative comparison demonstrate the effectiveness and advantages of our method. 相似文献

3.

Tree kernel-based semantic role labeling with enriched parse tree structure

GuoDong Zhou Junhui Li Jianxi Fan Qiaoming Zhu 《Information processing & management》2011

Shallow semantic parsing assigns a simple structure (such as WHO did WHAT to WHOM, WHEN, WHERE, WHY, and HOW) to each predicate in a sentence. It plays a critical role in event-based information extraction and thus is important for deep information processing and management. This paper proposes a tree kernel method for a particular shallow semantic parsing task, called semantic role labeling (SRL), with an enriched parse tree structure. First, a new tree kernel is presented to effectively capture the inherent structured knowledge in a parse tree by enabling the standard convolution tree kernel with context-sensitiveness via considering ancestral information of substructures and approximate matching via allowing insertion/deletion/substitution of tree nodes in the substructures. Second, an enriched parse tree structure is proposed to both well preserve the necessary structured information and effectively avoid noise by differentiating various portions of the parse tree structure. Evaluation on the CoNLL’2005 shared task shows that both the new tree kernel and the enriched parse tree structure contribute much in SRL and our tree kernel method significantly outperforms the state-of-the-art tree kernel methods. Moreover, our tree kernel method is proven rather complementary to the state-of-the-art feature-based methods in that it can better capture structural parse tree information. 相似文献

4.

Exploring syntactic structured features over parse trees for relation extraction using kernel methods

Min Zhang GuoDong Zhou Aiti Aw 《Information processing & management》2008

Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper proposes to use the convolution kernel over parse trees together with support vector machines to model syntactic structured information for relation extraction. Compared with linear kernels, tree kernels can effectively explore implicitly huge syntactic structured features embedded in a parse tree. Our study reveals that the syntactic structured features embedded in a parse tree are very effective in relation extraction and can be well captured by the convolution tree kernel. Evaluation on the ACE benchmark corpora shows that using the convolution tree kernel only can achieve comparable performance with previous best-reported feature-based methods. It also shows that our method significantly outperforms previous two dependency tree kernels for relation extraction. Moreover, this paper proposes a composite kernel for relation extraction by combining the convolution tree kernel with a simple linear kernel. Our study reveals that the composite kernel can effectively capture both flat and structured features without extensive feature engineering, and easily scale to include more features. Evaluation on the ACE benchmark corpora shows that the composite kernel outperforms previous best-reported methods in relation extraction. 相似文献

5.

基于句法语义的网络舆论情感倾向性评价技术研究 总被引：2，自引：0，他引：2

段建勇谢宇超张梅《情报杂志》2012,(1):147-150

提出一个基于句法语义的情感倾向性评测算法。首先构建特定领域的情感语料库,然后提取情感知识库,为后续情感分析提供必要的基本数据。算法以句子为基本单位进行处理,运用基于扩展句法树的语言处理模型,从单句到篇章计算文本情感倾向。实验证实该方法是有效的。相似文献

6.

Dependency parsing with finite state transducers and compression rules

Pablo Gamallo Marcos Garcia 《Information processing & management》2018,54(6):1244-1261

This article proposes a syntactic parsing strategy based on a dependency grammar containing formal rules and a compression technique that reduces the complexity of those rules. Compression parsing is mainly driven by the ‘single-head’ constraint of Dependency Grammar, and can be seen as an alternative method to the well-known constructive strategy. The compression algorithm simplifies the input sentence by progressively removing from it the dependent tokens as soon as binary syntactic dependencies are recognized. This strategy is thus similar to that used in deterministic dependency parsing. A compression parser was implemented and released under General Public License, as well as a cross-lingual grammar with Universal Dependencies, containing only broad-coverage rules applied to Romance languages. The system is an almost delexicalized parser which does not need training data to analyze Romance languages. The rule-based cross-lingual parser was submitted to CoNLL 2017 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. The performance of our system was compared to the other supervised systems participating in the competition, paying special attention to the parsing of different treebanks of the same language. We also trained a supervised delexicalized parser for Romance languages in order to compare it to our rule-based system. The results show that the performance of our cross-lingual method does not change across related languages and across different treebanks, while most supervised methods turn out to be very dependent on the text domain used to train the system. 相似文献

7.

Syntax-based dynamic latent graph for event relation extraction

《Information processing & management》2023,60(5):103469

This paper focuses on extracting temporal and parent–child relationships between news events in social news. Previous methods have proved that syntactic features are valid. However, most previous methods directly use the static outcomes parsed by syntactic parsing tools, but task-irrelevant or erroneous parses will inevitably degrade the performance of the model. In addition, many implicit higher-order connections that are directly related and critical to tasks are not explicitly exploited. In this paper, we propose a novel syntax-based dynamic latent graph model (SDLG) for this task. Specifically, we first apply a syntactic type-enhanced attention mechanism to assign different weights to different connections in the parsing results, which helps to filter out noisy connections and better fuse the information in the syntactic structures. Next, we introduce a dynamic event pair-aware induction graph to mine the task-related latent connections. It constructs a potential attention matrix to complement and correct the supervised syntactic features, using the semantics of the event pairs as a guide. Finally, the latent graph, together with the syntactic information, is fed into the graph convolutional network to obtain an improved representation of the event to complete relational reasoning. We have conducted extensive experiments on four public benchmarks, MATRES, TCR, HiEve and TB-Dense. The results show that our model outperforms the state-of-the-art model by 0.4%, 1.5%, 3.0% and 1.3% in F1 scores on the four datasets, respectively. Finally, we provide detailed analyses to show the effectiveness of each proposed component. 相似文献

8.

Self-training on refined clause patterns for relation extraction

Duc-Thuan Vo Ebrahim Bagheri 《Information processing & management》2018,54(4):686-706

Within the context of Information Extraction (IE), relation extraction is oriented towards identifying a variety of relation phrases and their arguments in arbitrary sentences. In this paper, we present a clause-based framework for information extraction in textual documents. Our framework focuses on two important challenges in information extraction: 1) Open Information Extraction and (OIE), and 2) Relation Extraction (RE). In the plethora of research that focus on the use of syntactic and dependency parsing for the purposes of detecting relations, there has been increasing evidence of incoherent and uninformative extractions. The extracted relations may even be erroneous at times and fail to provide a meaningful interpretation. In our work, we use the English clause structure and clause types in an effort to generate propositions that can be deemed as extractable relations. Moreover, we propose refinements to the grammatical structure of syntactic and dependency parsing that help reduce the number of incoherent and uninformative extractions from clauses. In our experiments both in the open information extraction and relation extraction domains, we carefully evaluate our system on various benchmark datasets and compare the performance of our work against existing state-of-the-art information extraction systems. Our work shows improved performance compared to the state-of-the-art techniques. 相似文献

9.

Extracting relation information from text documents by exploring various types of knowledge

GuoDong Zhou Min Zhang 《Information processing & management》2007

Extracting semantic relationships between entities from text documents is challenging in information extraction and important for deep information processing and management. This paper investigates the incorporation of diverse lexical, syntactic and semantic knowledge in feature-based relation extraction using support vector machines. Our study illustrates that the base phrase chunking information is very effective for relation extraction and contributes to most of the performance improvement from syntactic aspect while current commonly used features from full parsing give limited further enhancement. This suggests that most of useful information in full parse trees for relation extraction is shallow and can be captured by chunking. This indicates that a cheap and robust solution in relation extraction can be achieved without decreasing too much in performance. We also demonstrate how semantic information such as WordNet, can be used in feature-based relation extraction to further improve the performance. Evaluation on the ACE benchmark corpora shows that effective incorporation of diverse features enables our system outperform previously best-reported systems. It also shows that our feature-based system significantly outperforms tree kernel-based systems. This suggests that current tree kernels fail to effectively explore structured syntactic information in relation extraction. 相似文献

10.

Language processing and learning models for community question answering in Arabic

《Information processing & management》2019,56(2):274-290

In this paper we focus on the problem of question ranking in community question answering (cQA) forums in Arabic. We address the task with machine learning algorithms using advanced Arabic text representations. The latter are obtained by applying tree kernels to constituency parse trees combined with textual similarities, including word embeddings. Our two main contributions are: (i) an Arabic language processing pipeline based on UIMA—from segmentation to constituency parsing—built on top of Farasa, a state-of-the-art Arabic language processing toolkit; and (ii) the application of long short-term memory neural networks to identify the best text fragments in questions to be used in our tree-kernel-based ranker. Our thorough experimentation on a recently released cQA dataset shows that the Arabic linguistic processing provided by Farasa produces strong results and that neural networks combined with tree kernels further boost the performance in terms of both efficiency and accuracy. Our approach also enables an implicit comparison between different processing pipelines as our tests on Farasa and Stanford parsers demonstrate. 相似文献

11.

HGNN: Hyperedge-based graph neural network for MOOC Course Recommendation

《Information processing & management》2022,59(3):102938

Previous studies on Course Recommendation (CR) mainly focus on investigating the sequential relationships among courses (RNN is applied) and fail to learn the similarity relationships among learners. Moreover, existing RNN-based methods can only model courses’ short-term sequential patterns due to the inherent shortcomings of RNNs. In light of the above issues, we develop a hyperedge-based graph neural network, namely HGNN, for CR. Specifically, (1) to model the relationships among learners, we treat learners (i.e., hyperedges) as the sets of courses in a hypergraph, and convert the task of learning learners’ representations to induce the embeddings for hyperedges, where a hyperedge-based graph attention network is further proposed. (2) To simultaneously consider courses’ long-term and short-term sequential relationships, we first construct a course sequential graph across learners, and learn courses’ representations via a modified graph attention network. Then, we feed the learned representations into a GRU-based sequence encoder to infer their short-term patterns, and deem the last hidden state as the learned sequence-level learner embedding. After that, we obtain the learners’ final representations by a product pooling operation to retain features from different latent spaces, and optimize a cross-entropy loss to make recommendations. To evaluate our proposed solution HGNN, we conduct extensive experiments on two real-world datasets, XuetangX and MovieLens. We conduct experiments on MovieLens to prove the extensibility of our solution on other collections. From the experimental results, we can find that HGNN evidently outperforms other recent CR methods on both datasets, achieving 11.96% on P@20, 16.01% on NDCG@20, and 27.62% on MRR@20 on XuetangX, demonstrating the effectiveness of studying CR in a hypergraph, and the importance of considering both long-term and short-term sequential patterns of courses. 相似文献

12.

浅谈语义、句法对大学英语写作的推动作用

邴照宇《科教文汇》2012,(10):119-120

大学英语教学中的写作教学与语言学的语义和句法有着紧密的联系。对于语义的讲解可以使学生激活思维、展开联想,在脑海中形成有规模的英语知识体系,使写作思路拓宽;对于句法的研究可以逐步增强学生的英语语言基本功,尽量少犯句法的错误,在写作的时候可以避免提笔写错句,让学生逐步养成良好的写作习惯,从而促进写作水平的提高。相似文献

13.

Syntactic based approach for grammar question retrieval

Lanting Fang Luu Anh Tuan Siu Cheung Hui Lenan Wu 《Information processing & management》2018,54(2):184-202

With the popularity of online educational platforms, English learners can learn and practice no matter where they are and what they do. English grammar is one of the important components in learning English. To learn English grammar effectively, it requires students to practice questions containing focused grammar knowledge. In this paper, we study a novel problem of retrieving English grammar questions with similar grammatical focus. Since the grammatical focus similarity is different from textual similarity or sentence syntactic similarity, existing approaches cannot be applied directly to our problem. To address this problem, we propose a syntactic based approach for English grammar question retrieval which can retrieve related grammar questions with similar grammatical focus effectively. In the proposed syntactic based approach, we first propose a new syntactic tree, namely parse-key tree, to capture English grammar questions’ grammatical focus. Next, we propose two kernel functions, namely relaxed tree kernel and part-of-speech order kernel, to compute the similarity between two parse-key trees of the query and grammar questions in the collection. Then, the retrieved grammar questions are ranked according to the similarity between the parse-key trees. In addition, if a query is submitted together with answer choices, conceptual similarity and textual similarity are also incorporated to further improve the retrieval accuracy. The performance results have shown that our proposed approach outperforms the state-of-the-art methods based on statistical analysis and syntactic analysis. 相似文献

14.

Dependency structure language model for topic detection and tracking

Changki Lee Gary Geunbae Lee Myunggil Jang 《Information processing & management》2007

In this paper, we propose a new language model, namely, a dependency structure language model, for topic detection and tracking (TDT) to compensate for weakness of unigram and bigram language models. The dependency structure language model is based on the Chow expansion theory and the dependency parse tree generated by a linguistic parser. So, long-distance dependencies can be naturally captured by the dependency structure language model. We carried out extensive experiments to verify the proposed model on topic tracking and link detection in TDT. In both cases, the dependency structure language models perform better than strong baseline approaches. 相似文献

15.

Sentence modeling via multiple word embeddings and multi-level comparison for semantic textual similarity

《Information processing & management》2019,56(6):102090

Recently, using a pretrained word embedding to represent words achieves success in many natural language processing tasks. According to objective functions, different word embedding models capture different aspects of linguistic properties. However, the Semantic Textual Similarity task, which evaluates similarity/relation between two sentences, requires to take into account of these linguistic aspects. Therefore, this research aims to encode various characteristics from multiple sets of word embeddings into one embedding and then learn similarity/relation between sentences via this novel embedding. Representing each word by multiple word embeddings, the proposed MaxLSTM-CNN encoder generates a novel sentence embedding. We then learn the similarity/relation between our sentence embeddings via Multi-level comparison. Our method M-MaxLSTM-CNN consistently shows strong performances in several tasks (i.e., measure textual similarity, identify paraphrase, recognize textual entailment). Our model does not use hand-crafted features (e.g., alignment features, Ngram overlaps, dependency features) as well as does not require pre-trained word embeddings to have the same dimension. 相似文献

16.

A local tree alignment approach to relation extraction of multiple arguments

Seokhwan Kim Minwoo Jeong Gary Geunbae Lee 《Information processing & management》2011

In this paper, we address the problem of relation extraction of multiple arguments where the relation of entities is framed by multiple attributes. Such complex relations are successfully extracted using a syntactic tree-based pattern matching method. While induced subtree patterns are typically used to model the relations of multiple entities, we argue that hard pattern matching between a pattern database and instance trees cannot allow us to examine similar tree structures. Thus, we explore a tree alignment-based soft pattern matching approach to improve the coverage of induced patterns. Our pattern learning algorithm iteratively searches the most influential dependency tree patterns as well as a control parameter for each pattern. The resulting method outperforms two baselines, a pairwise approach with the tree-kernel support vector machine and a hard pattern matching method, on two standard datasets for a complex relation extraction task. 相似文献

17.

Parsing human image by fusing semantic and spatial features: A deep learning approach

《Information processing & management》2020,57(6):102306

How to parse the human image to obtain the text label corresponding to the human body is a critical task for human-computer interaction. Although previous methods have significantly improved the parsing performance, the problem of parsing confusion and tiny target missing remains unresolved, which leads to errors and incomplete inference accordingly. Targeting at these drawbacks, we fuse semantic and spatial features to mine the human body information based on the Dual Pyramid Unit convolutional neural network, named as DPUNet. DPUNet is composed of Context Pyramid Unit (CPU) and Spatial Pyramid Unit (SPU). Firstly, we design the CPU to aggregate the local to global semantic information, which exports the semantic feature for eliminating the semantic confusion. To capture the tiny targets for preventing the details from missing, the SPU is proposed to incorporate the multi-scale spatial information and output the spatial feature. Finally, the features of two complementary units are fused for accurate and complete human parsing results. Our approach achieves more excellent performance than the state-of-the-art methods on single human and multiple human parsing datasets. Meanwhile, the proposed framework is efficient with a fast speed of 41.2fps. 相似文献

18.

Extraction of complex index terms in non-English IR: A shallow parsing based approach

Jesús Vilares Miguel A. Alonso Manuel Vilares 《Information processing & management》2008

The performance of information retrieval systems is limited by the linguistic variation present in natural language texts. Word-level natural language processing techniques have been shown to be useful in reducing this variation. In this article, we summarize our work on the extension of these techniques for dealing with phrase-level variation in European languages, taking Spanish as a case in point. We propose the use of syntactic dependencies as complex index terms in an attempt to solve the problems deriving from both syntactic and morpho-syntactic variation and, in this way, to obtain more precise index terms. Such dependencies are obtained through a shallow parser based on cascades of finite-state transducers in order to reduce as far as possible the overhead due to this parsing process. The use of different sources of syntactic information, queries or documents, has been also studied, as has the restriction of the dependencies applied to those obtained from noun phrases. Our approaches have been tested using the CLEF corpus, obtaining consistent improvements with regard to classical word-level non-linguistic techniques. Results show, on the one hand, that syntactic information extracted from documents is more useful than that from queries. On the other hand, it has been demonstrated that by restricting dependencies to those corresponding to noun phrases, important reductions of storage and management costs can be achieved, albeit at the expense of a slight reduction in performance. 相似文献

19.

Improving unsupervised keyphrase extraction by modeling hierarchical multi-granularity features

《Information processing & management》2023,60(4):103356

Existing unsupervised keyphrase extraction methods typically emphasize the importance of the candidate keyphrase itself, ignoring other important factors such as the influence of uninformative sentences. We hypothesize that the salient sentences of a document are particularly important as they are most likely to contain keyphrases, especially for long documents. To our knowledge, our work is the first attempt to exploit sentence salience for unsupervised keyphrase extraction by modeling hierarchical multi-granularity features. Specifically, we propose a novel position-aware graph-based unsupervised keyphrase extraction model, which includes two model variants. The pipeline model first extracts salient sentences from the document, followed by keyphrase extraction from the extracted salient sentences. In contrast to the pipeline model which models multi-granularity features in a two-stage paradigm, the joint model accounts for both sentence and phrase representations of the source document simultaneously via hierarchical graphs. Concretely, the sentence nodes are introduced as an inductive bias, injecting sentence-level information for determining the importance of candidate keyphrases. We compare our model against strong baselines on three benchmark datasets including Inspec, DUC 2001, and SemEval 2010. Experimental results show that the simple pipeline-based approach achieves promising results, indicating that keyphrase extraction task benefits from the salient sentence extraction task. The joint model, which mitigates the potential accumulated error of the pipeline model, gives the best performance and achieves new state-of-the-art results while generalizing better on data from different domains and with different lengths. In particular, for the SemEval 2010 dataset consisting of long documents, our joint model outperforms the strongest baseline UKERank by 3.48%, 3.69% and 4.84% in terms of F1@5, F1@10 and F1@15, respectively. We also conduct qualitative experiments to validate the effectiveness of our model components. 相似文献

20.

Discriminative sentence compression with conditional random fields 总被引：2，自引：0，他引：2

Tadashi Nomoto 《Information processing & management》2007,43(6):1571

The paper focuses on a particular approach to automatic sentence compression which makes use of a discriminative sequence classifier known as Conditional Random Fields (CRF). We devise several features for CRF that allow it to incorporate information on nonlinear relations among words. Along with that, we address the issue of data paucity by collecting data from RSS feeds available on the Internet, and turning them into training data for use with CRF, drawing on techniques from biology and information retrieval. We also discuss a recursive application of CRF on the syntactic structure of a sentence as a way of improving the readability of the compression it generates. Experiments found that our approach works reasonably well compared to the state-of-the-art system [Knight, K., & Marcu, D. (2002). Summarization beyond sentence extraction: A probabilistic approach to sentence compression. Artificial Intelligence 139, 91–107.]. 相似文献