期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Entity disambiguation to Wikipedia using collective ranking

《Information processing & management》2016,52(6):1247-1257

Entity disambiguation is a fundamental task of semantic Web annotation. Entity Linking (EL) is an essential procedure in entity disambiguation, which aims to link a mention appearing in a plain text to a structured or semi-structured knowledge base, such as Wikipedia. Existing research on EL usually annotates the mentions in a text one by one and treats entities independent to each other. However this might not be true in many application scenarios. For example, if two mentions appear in one text, they are likely to have certain intrinsic relationships. In this paper, we first propose a novel query expansion method for candidate generation utilizing the information of co-occurrences of mentions. We further propose a re-ranking model which can be iteratively adjusted based on the prediction in the previous round. Experiments on real-world data demonstrate the effectiveness of our proposed methods for entity disambiguation. 相似文献

2.

Neural entity alignment with cross-modal supervision

《Information processing & management》2023,60(2):103174

The majority of currently available entity alignment (EA) solutions primarily rely on structural information to align entities, which is biased and disregards additional multi-source information. To compensate for inadequate structural details, this article suggests the SKEA framework, which is a simple but flexible framework for Entity Alignment with cross-modal supervision of Supporting Knowledge. We employ a relational aggregate network to specifically utilize the details about the entity and its neighbors. To overcome the limitations of relational features, two multi-modal encode modules are being used to extract visual and textural information. A new set of potential aligned entity pairs are generated by SKEA in each iteration using the knowledge of two reference modalities, which can enhance the model’s supervision. It is important to note that the supporting information used in our framework does not participate in the network’s backpropagation, which considerably improves efficiency and differs dramatically from earlier work. In comparison to existing baselines, experiments demonstrate that our proposed framework can incorporate multi-aspect information efficiently and enable supervisory signals from other modalities to transmit to entities. The maximum performance improvement of 5.24% indicates our suggested framework’s superiority, especially for sparse KGs. 相似文献

3.

Knowledge representation learning with entity descriptions,hierarchical types,and textual relations

Xing Tang Ling Chen Jun Cui Baogang Wei 《Information processing & management》2019,56(3):809-822

相似文献

4.

Feature-enriched matrix factorization for relation extraction

Duc-Thuan Vo Ebrahim Bagheri 《Information processing & management》2019,56(3):424-444

Relation extraction aims at finding meaningful relationships between two named entities from within unstructured textual content. In this paper, we define the problem of information extraction as a matrix completion problem where we employ the notion of universal schemas formed as a collection of patterns derived from open information extraction systems as well as additional features derived from grammatical clause patterns and statistical topic models. One of the challenges with earlier work that employ matrix completion methods is that such approaches require a sufficient number of observed relation instances to be able to make predictions. However, in practice there is often insufficient number of explicit evidence supporting each relation type that could be used within the matrix model. Hence, existing work suffer from a low recall. In our work, we extend the work in the state of the art by proposing novel ways of integrating two sets of features, i.e., topic models and grammatical clause structures, for alleviating the low recall problem. More specifically, we propose that it is possible to (1) employ grammatical clause information from textual sentences to serve as an implicit indication of relation type and argument similarity. The basis for this is that it is likely that similar relation types and arguments are observed within similar grammatical structures, and (2) benefit from statistical topic models to determine similarity between relation types and arguments. We employ statistical topic models to determine relation type and argument similarity based on their co-occurrence within the same topics. We have performed extensive experiments based on both gold standard and silver standard datasets. The experiments show that our approach has been able to address the low recall problem in existing methods, by showing an improvement of 21% on recall and 8% on f-measure over the state of the art baseline. 相似文献

5.

Relevance-based entity selection for ad hoc retrieval

Faezeh Ensan Feras Al-Obeidat 《Information processing & management》2019,56(5):1645-1666

Recent developments have shown that entity-based models that rely on information from the knowledge graph can improve document retrieval performance. However, given the non-transitive nature of relatedness between entities on the knowledge graph, the use of semantic relatedness measures can lead to topic drift. To address this issue, we propose a relevance-based model for entity selection based on pseudo-relevance feedback, which is then used to systematically expand the input query leading to improved retrieval performance. We perform our experiments on the widely used TREC Web corpora and empirically show that our proposed approach to entity selection significantly improves ad hoc document retrieval compared to strong baselines. More concretely, the contributions of this work are as follows: (1) We introduce a graphical probability model that captures dependencies between entities within the query and documents. (2) We propose an unsupervised entity selection method based on the graphical model for query entity expansion and then for ad hoc retrieval. (3) We thoroughly evaluate our method and compare it with the state-of-the-art keyword and entity based retrieval methods. We demonstrate that the proposed retrieval model shows improved performance over all the other baselines on ClueWeb09B and ClueWeb12B, two widely used Web corpora, on the [email protected], and [email protected] metrics. We also show that the proposed method is most effective on the difficult queries. In addition, We compare our proposed entity selection with a state-of-the-art entity selection technique within the context of ad hoc retrieval using a basic query expansion method and illustrate that it provides more effective retrieval for all expansion weights and different number of expansion entities. 相似文献

6.

基于司法判决书的知识图谱构建与知识服务应用分析

下载免费PDF全文

黄茜茜杨建林《情报科学》2022,39(2):133-140

【目的/意义】构建基于司法判决书的案件知识图谱是对司法数字资源的有效利用,有助于提升司法智能化水平,积极响应国家“智慧法院”建设发展战略。【方法/过程】以“网络诈骗”领域为例,用“自顶向下”的方式构建知识图谱。首先,结合文书内容与专家意见构建案件领域本体;接着,通过知识抽取、知识表示、知识融合等环节获取实体、属性及关系;再利用Neo4j生成案件知识图谱。最后,提出了基于知识图谱的智慧司法知识服务框架。【结果/ 结论】基于 2015年-2020年的“网络诈骗”领域司法判决书,构建了含有约 3万个实体和 18万条关系的案件知识图谱,并详细阐述了具备基础资源层、知识图谱层、服务应用层的智慧司法知识服务框架设计。【创新/局限】实现了案件知识图谱的实体类型扩充,以丰富图谱应用场景,并将知识图谱技术与智慧司法知识服务框架进行融合;局限在于仅使用网络诈骗领域判决书数据进行实证研究。相似文献

7.

Cascade embedding model for knowledge graph inference and retrieval

《Information processing & management》2019,56(6):102093

Knowledge graphs are widely used in retrieval systems, question answering systems (QA), hypothesis generation systems, etc. Representation learning provides a way to mine knowledge graphs to detect missing relations; and translation-based embedding models are a popular form of representation model. Shortcomings of translation-based models however, limits their practicability as knowledge completion algorithms. The proposed model helps to address some of these shortcomings.The similarity between graph structural features of two entities was found to be correlated to the relations of those entities. This correlation can help to solve the problem caused by unbalanced relations and reciprocal relations. We used Node2vec, a graph embedding algorithm, to represent information related to an entity's graph structure, and we introduce a cascade model to incorporate graph embedding with knowledge embedding into a unified framework. The cascade model first refines feature representation in the first two stages (Local Optimization Stage), and then uses backward propagation to optimize parameters of all the stages (Global Optimization Stage). This helps to enhance the knowledge representation of existing translation-based algorithms by taking into account both semantic features and graph features and fusing them to extract more useful information. Besides, different cascade structures are designed to find the optimal solution to the problem of knowledge inference and retrieval.The proposed model was verified using three mainstream knowledge graphs: WIN18, FB15K and BioChem. Experimental results were validated using the hit@10 rate entity prediction task. The proposed model performed better than TransE, giving an average improvement of 2.7% on WN18, 2.3% on FB15k and 28% on BioChem. Improvements were particularly marked where there were problems with unbalanced relations and reciprocal relations. Furthermore, the stepwise-cascade structure is proved to be more effective and significantly outperforms other baselines. 相似文献

8.

Predicate constraints based question answering over knowledge graph

《Information processing & management》2019,56(3):445-462

Generally, QA systems suffer from the structural difference where a question is composed of unstructured data, while its answer is made up of structured data in a Knowledge Graph (KG). To bridge this gap, most approaches use lexicons to cover data that are represented differently. However, the existing lexicons merely deal with representations for entity and relation mentions rather than consulting the comprehensive meaning of the question. To resolve this, we design a novel predicate constraints lexicon which restricts subject and object types for a predicate. It facilitates a comprehensive validation of a subject, predicate and object simultaneously. In this paper, we propose Predicate Constraints based Question Answering (PCQA). Our method prunes inappropriate entity/relation matchings to reduce search space, thus leading to an improvement of accuracy. Unlike the existing QA systems, we do not use any templates but generates query graphs to cover diverse types of questions. In query graph generation, we put more focus on matching relations rather than linking entities. This is well-suited to the use of predicate constraints. Our experimental results prove the validity of our approach and demonstrate a reasonable performance compared to other methods which target WebQuestions and Free917 benchmarks. 相似文献

9.

An entity-graph based reasoning method for fact verification

Chonghao Chen Fei Cai Xuejun Hu Jianming Zheng Yanxiang Ling Honghui Chen 《Information processing & management》2021,58(3):102472

Fact verification aims to retrieve relevant evidence from a knowledge base, e.g., Wikipedia, to verify the given claims. Existing methods only consider the sentence-level semantics for evidence representations, which typically neglect the importance of fine-grained features in the evidence-related sentences. In addition, the interpretability of the reasoning process has not been well studied in the field of fact verification. To address such issues, we propose an entity-graph based reasoning method for fact verification abbreviated as RoEG, which generates the fine-grained features of evidence at the entity-level and models the human reasoning paths based on an entity graph. In detail, to capture the semantic relations of retrieved evidence, RoEG introduces the entities as nodes and constructs the edges in the graph based on three linking strategies. Then, RoEG utilizes a selection gate to constrain the information propagation in the sub-graph of relevant entities and applies a graph neural network to propagate the entity-features for reasoning. Finally, RoEG employs an attention aggregator to gather the information of entities for label prediction. Experimental results on a large-scale benchmark dataset FEVER demonstrate the effectiveness of our proposal by beating the competitive baselines in terms of label accuracy and FEVER Score. In particular, for a task of multiple-evidence fact verification, RoEG produces 5.48% and 4.35% improvements in terms of label accuracy and FEVER Score against the state-of-the-art baseline. In addition, RoEG shows a better performance when more entities are involved for fact verification. 相似文献

10.

Learning entity-centric document representations using an entity facet topic model

《Information processing & management》2020,57(3):102216

Learning semantic representations of documents is essential for various downstream applications, including text classification and information retrieval. Entities, as important sources of information, have been playing a crucial role in assisting latent representations of documents. In this work, we hypothesize that entities are not monolithic concepts; instead they have multiple aspects, and different documents may be discussing different aspects of a given entity. Given that, we argue that from an entity-centric point of view, a document related to multiple entities shall be (a) represented differently for different entities (multiple entity-centric representations), and (b) each entity-centric representation should reflect the specific aspects of the entity discussed in the document.In this work, we devise the following research questions: (1) Can we confirm that entities have multiple aspects, with different aspects reflected in different documents, (2) can we learn a representation of entity aspects from a collection of documents, and a representation of document based on the multiple entities and their aspects as reflected in the documents, (3) does this novel representation improves algorithm performance in downstream applications, and (4) what is a reasonable number of aspects per entity? To answer these questions we model each entity using multiple aspects (entity facets¹), where each entity facet is represented as a mixture of latent topics. Then, given a document associated with multiple entities, we assume multiple entity-centric representations, where each entity-centric representation is a mixture of entity facets for each entity. Finally, a novel graphical model, the Entity Facet Topic Model (EFTM), is proposed in order to learn entity-centric document representations, entity facets, and latent topics.Through experimentation we confirm that (1) entities are multi-faceted concepts which we can model and learn, (2) a multi-faceted entity-centric modeling of documents can lead to effective representations, which (3) can have an impact in downstream application, and (4) considering a small number of facets is effective enough. In particular, we visualize entity facets within a set of documents, and demonstrate that indeed different sets of documents reflect different facets of entities. Further, we demonstrate that the proposed entity facet topic model generates better document representations in terms of perplexity, compared to state-of-the-art document representation methods. Moreover, we show that the proposed model outperforms baseline methods in the application of multi-label classification. Finally, we study the impact of EFTM’s parameters and find that a small number of facets better captures entity specific topics, which confirms the intuition that on average an entity has a small number of facets reflected in documents. 相似文献

11.

远程监督实体关系抽取研究

下载免费PDF全文

柯佳《情报科学》2021,39(10):165-169

【目的/意义】实体关系抽取是构建领域本体、知识图谱、开发问答系统的基础工作。远程监督方法将大规模非结构化文本与已有的知识库实体对齐,自动标注训练样本,解决了有监督机器学习方法人工标注训练语料耗时费力的问题,但也带来了数据噪声。【方法/过程】本文详细梳理了近些年远程监督结合深度学习技术,降低训练样本噪声,提升实体关系抽取性能的方法。【结果/结论】卷积神经网络能更好的捕获句子局部、关键特征、长短时记忆网络能更好的处理句子实体对远距离依赖关系,模型自动抽取句子词法、句法特征,注意力机制给予句子关键上下文、单词更大的权重,在神经网络模型中融入先验知识能丰富句子实体对的语义信息,显著提升关系抽取性能。【创新/局限】下一步的研究应考虑实体对重叠关系、实体对长尾语义关系的处理方法,更加全面的解决实体对关系噪声问题。相似文献

12.

Assessing the quality of textual features in social media

Flavio Figueiredo Henrique Pinto Fabiano Belém Jussara Almeida Marcos Gonçalves David Fernandes Edleno Moura 《Information processing & management》2013

相似文献

13.

Exploiting context-awareness and multi-criteria decision making to improve items recommendation using a tripartite graph-based model

《Information processing & management》2022,59(2):102861

Integrating useful input information is essential to provide efficient recommendations to users. In this work, we focus on improving items ratings prediction by merging both multiple contexts and multiple criteria based research directions which were addressed separately in most existent literature. Throughout this article, Criteria refer to the items attributes, while Context denotes the circumstances in which the user uses an item. Our goal is to capture more fine grained preferences to improve items recommendation quality using users’ multiple criteria ratings under specific contextual situations. Therefore, we examine the recommenders’ data from the graph theory based perspective by representing three types of entities (users, contextual situations and criteria) as well as their relationships as a tripartite graph. Upon the assumption that contextually similar users tend to have similar interests for similar item criteria, we perform a high-order co-clustering on the tripartite graph for simultaneously partitioning the graph entities representing users in similar contextual situations and their evaluated item criteria. To predict cluster-based multi-criteria ratings, we introduce an improved rating prediction method that considers the dependency between users and their contextual situations, and also takes into account the correlation between criteria in the prediction process. The predicted multi-criteria ratings are finally aggregated into a single representative output corresponding to an overall item rating. To guide our investigation, we create a research hypothesis to provide insights about the tripartite graph partitioning and design clear and justified preliminary experiments including quantitative and qualitative analyzes to validate it. Further thorough experiments on the two available context-aware multi-criteria datasets, TripAdvisor and Educational, demonstrate that our proposal exhibits substantial improvements over alternative recommendations approaches. 相似文献

14.

Preferences in Wikipedia abstracts: Empirical findings and implications for automatic entity summarization

Danyun Xu Gong Cheng Yuzhong Qu 《Information processing & management》2014

相似文献

15.

基于多源异构数据挖掘的在线评论知识图谱构建

下载免费PDF全文

李叶叶李贺沈旺曹阳涂敏《情报科学》2022,39(2):65-73

【目的/意义】随着网络购物的普及,在线评论成为影响消费者、销售者和生产者决策的重要数据。大数据时代,在线评论呈现出多源异构、爆发式增长的特点,难以为用户的购买决策和商家竞争提供有力的情报支撑。【方法/过程】本文利用多源异构的在线评论数据构建知识图谱,提出了一种基于多源异构数据构建知识图谱的框架, 模式层构建围绕在线评论的信源、内容以及形式构建,最终形成知识图谱的概念框架,并运用word2vec从多源异构文本中获取实体、关系和属性,并进行数据融合与知识图谱分析。【结果/结论】实验部分以手机商品在线评论为例, 验证了本文所构建的知识图谱对在线评论相关研究及挖掘的有效性,研究结果揭示了多源异构在线评论数据的特点,为大数据环境下在线评论信息组织、展示和挖掘提供了新的研究视角。【创新/局限】运用知识图谱对在线评论进行描述,有效解决信息过载、多源异构信息融合等问题。本文采用半自动化的方式构建知识图谱,未来考虑引入无监督的方法提高构建效率。相似文献

16.

Boundaries and edges rethinking: An end-to-end neural model for overlapping entity relation extraction

《Information processing & management》2020,57(6):102311

Overlapping entity relation extraction has received extensive research attention in recent years. However, existing methods suffer from the limitation of long-distance dependencies between entities, and fail to extract the relations when the overlapping situation is relatively complex. This issue limits the performance of the task. In this paper, we propose an end-to-end neural model for overlapping relation extraction by treating the task as a quintuple prediction problem. The proposed method first constructs the entity graphs by enumerating possible candidate spans, then models the relational graphs between entities via a graph attention model. Experimental results on five benchmark datasets show that the proposed model achieves the current best performance, outperforming previous methods and baseline systems by a large margin. Further analysis shows that our model can effectively capture the long-distance dependencies between entities in a long sentence. 相似文献

17.

Crime base: Towards building a knowledge base for crime entities and their relationships from online news papers

《Information processing & management》2019,56(6):102059

In the current era of internet, information related to crime is scattered across many sources namely news media, social networks, blogs, and video repositories, etc. Crime reports published in online newspapers are often considered as reliable compared to crowdsourced data like social media and contain crime information not only in the form of unstructured text but also in the form of images. Given the volume and availability of crime-related information present in online newspapers, gathering and integrating crime entities from multiple modalities and representing them as a knowledge base in machine-readable form will be useful for any law enforcement agencies to analyze and prevent criminal activities. Extant research works to generate the crime knowledge base, does not address extraction of all non-redundant entities from text and image data present in multiple newspapers. Hence, this work proposes Crime Base, an entity relationship based system to extract and integrate crime related text and image data from online newspapers with a focus towards reducing duplicity and loss of information in the knowledge base. The proposed system uses a rule-based approach to extract the entities from text and image captions. The entities extracted from text data are correlated using contextual as-well-as semantic similarity measures and image entities are correlated using low-level and high-level image features. The proposed system also presents an integrated view of these entities and their relations in the form of a knowledge base using OWL. The system is tested for a collection of crime related articles from popular Indian online newspapers. 相似文献

18.

AHAB: Aligning heterogeneous knowledge bases via iterative blocking

Ling Chen Weidong Gu Xiaoxue Tian Gencai Chen 《Information processing & management》2019,56(1):1-13

With the development of information extraction, there have been an increasing number of large-scale knowledge bases available in different domains. In recent years, a great deal of approaches have been proposed for large-scale knowledge base alignment. Most of them are based on iterative matching. If a pair of entities has been aligned, their compatible neighbors are selected as candidate entity pairs. The limitation of these methods is that they discover candidate entity pairs depending on aligned relations, which cannot be used for aligning heterogeneous knowledge bases. Only few existing methods focus on aligning heterogeneous knowledge bases, which discover candidate entity pairs just for once by traditional blocking methods. However, the performance of these methods depends on blocking keys heavily, which are hard to select. In this paper, we present an approach for aligning heterogeneous knowledge bases via iterative blocking (AHAB) to improve the discovery and refinement of candidate entity pairs. AHAB iteratively utilizes different relations for blocking, and then matches block pairs based on matched entity pairs. The Cartesian product of unmatched entities in matched block pairs forms candidate entity pairs. By filtering out dissimilar candidate entity pairs, matched entity pairs will be found. The number of matched entity pairs proliferates with iterations, which in turn helps match block pairs in each iteration. Experiments on real-world heterogeneous knowledge bases demonstrate that AHAB is able to yield a competitive performance. 相似文献

19.

Exploring temporal representations by leveraging attention-based bidirectional LSTM-RNNs for multi-modal emotion recognition

《Information processing & management》2020,57(3):102185

Emotional recognition contributes to automatically perceive the user’s emotional response to multimedia content through implicit annotation, which further benefits establishing effective user-centric services. Physiological-based ways have increasingly attract researcher’s attention because of their objectiveness on emotion representation. Conventional approaches to solve emotion recognition have mostly focused on the extraction of different kinds of hand-crafted features. However, hand-crafted feature always requires domain knowledge for the specific task, and designing the proper features may be more time consuming. Therefore, exploring the most effective physiological-based temporal feature representation for emotion recognition becomes the core problem of most works. In this paper, we proposed a multimodal attention-based BLSTM network framework for efficient emotion recognition. Firstly, raw physiological signals from each channel are transformed to spectrogram image for capturing their time and frequency information. Secondly, Attention-based Bidirectional Long Short-Term Memory Recurrent Neural Networks (LSTM-RNNs) are utilized to automatically learn the best temporal features. The learned deep features are then fed into a deep neural network (DNN) to predict the probability of emotional output for each channel. Finally, decision level fusion strategy is utilized to predict the final emotion. The experimental results on AMIGOS dataset show that our method outperforms other state of art methods. 相似文献

20.

Knowledge based collection selection for distributed information retrieval

Baoli Han Ling Chen Xiaoxue Tian 《Information processing & management》2018,54(1):116-128

相似文献