期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Integrating expert profile,reputation and link analysis for expert finding in question-answering websites

Duen-Ren Liu Yu-Hsuan ChenWei-Chen Kao Hsiu-Wen Wang 《Information processing & management》2013

Question answering websites are becoming an ever more popular knowledge sharing platform. On such websites, people may ask any type of question and then wait for someone else to answer the question. However, in this manner, askers may not obtain correct answers from appropriate experts. Recently, various approaches have been proposed to automatically find experts in question answering websites. In this paper, we propose a novel hybrid approach to effectively find experts for the category of the target question in question answering websites. Our approach considers user subject relevance, user reputation and authority of a category in finding experts. A user’s subject relevance denotes the relevance of a user’s domain knowledge to the target question. A user’s reputation is derived from the user’s historical question-answering records, while user authority is derived from link analysis. Moreover, our proposed approach has been extended to develop a question dependent approach that considers the relevance of historical questions to the target question in deriving user domain knowledge, reputation and authority. We used a dataset obtained from Yahoo! Answer Taiwan to evaluate our approach. Our experiment results show that our proposed methods outperform other conventional methods. 相似文献

2.

Unsupervised Latent Dirichlet Allocation for supervised question classification

Saeedeh Momtazi 《Information processing & management》2018,54(3):380-393

Question answering systems assist users in satisfying their information needs more precisely by providing focused responses to their questions. Among the various systems developed for such a purpose, community-based question answering has recently received researchers’ attention due to the large amount of user-generated questions and answers in social question-and-answer platforms. Reusing such data sources requires an accurate information retrieval component enhanced by a question classifier. The question classification gives the system the possibility to have information about question categories to focus on questions and answers from relevant categories to the input question. In this paper, we propose a new method based on unsupervised Latent Dirichlet Allocation for classifying questions in community-based question answering. Our method first uses unsupervised topic modeling to extract topics from a large amount of unlabeled data. The learned topics are then used in the training phase to find their association with the available category labels in the training data. The category mixture of topics is finally used to predict the label of unseen data. 相似文献

3.

Semantic matching in machine reading comprehension: An empirical study

《Information processing & management》2023,60(2):103145

Machine reading comprehension (MRC) is a challenging task in the field of artificial intelligence. Most existing MRC works contain a semantic matching module, either explicitly or intrinsically, to determine whether a piece of context answers a question. However, there is scant work which systematically evaluates different paradigms using semantic matching in MRC. In this paper, we conduct a systematic empirical study on semantic matching. We formulate a two-stage framework which consists of a semantic matching model and a reading model, based on pre-trained language models. We compare and analyze the effectiveness and efficiency of using semantic matching modules with different setups on four types of MRC datasets. We verify that using semantic matching before a reading model improves both the effectiveness and efficiency of MRC. Compared with answering questions by extracting information from concise context, we observe that semantic matching yields more improvements for answering questions with noisy and adversarial context. Matching coarse-grained context to questions, e.g., paragraphs, is more effective than matching fine-grained context, e.g., sentences and spans. We also find that semantic matching is helpful for answering who/where/when/what/how/which questions, whereas it decreases the MRC performance on why questions. This may imply that semantic matching helps to answer a question whose necessary information can be retrieved from a single sentence. The above observations demonstrate the advantages and disadvantages of using semantic matching in different scenarios. 相似文献

4.

Biased LexRank: Passage retrieval using random walks with question-based priors

Jahna Otterbacher Gunes Erkan Dragomir R. Radev 《Information processing & management》2009

We present Biased LexRank, a method for semi-supervised passage retrieval in the context of question answering. We represent a text as a graph of passages linked based on their pairwise lexical similarity. We use traditional passage retrieval techniques to identify passages that are likely to be relevant to a user’s natural language question. We then perform a random walk on the lexical similarity graph in order to recursively retrieve additional passages that are similar to other relevant passages. We present results on several benchmarks that show the applicability of our work to question answering and topic-focused text summarization. 相似文献

5.

Linguistic kernels for answer re-ranking in question answering systems

Alessandro Moschitti Silvia Quarteroni 《Information processing & management》2011

Answer selection is the most complex phase of a question answering (QA) system. To solve this task, typical approaches use unsupervised methods such as computing the similarity between query and answer, optionally exploiting advanced syntactic, semantic or logic representations. 相似文献

6.

User-interactive innovation knowledge acquisition model based on social media

《Information processing & management》2022,59(3):102923

Mainstream social media, such as Facebook, Twitter, and Weibo, provide enterprises an opportunity to innovate and develop. User-generated content on social media platforms can help determine the needs of the user and identify a target market, providing a basis for enterprise innovation. In this study, we propose a user-interactive innovation knowledge acquisition model. Accordingly, the comments data on a selected forum were first crawled using network crawler software. Subsequently, we pre-processed the data to obtain a semi-structured user corpus. We then used the Latent Dirichlet Allocation model to cluster topics and obtain the subject words that were hidden from each comment text. A user demand ontology was built based on the subject words, and with an expert's reference, the product function ontology was established. Through semantic similarity matching, we integrated two ontologies to obtain the user-interactive innovation knowledge acquisition model. Finally, the model was validated using the Volvo XC60 automobile as an example. The empirical results showed that the proposed model could assist enterprises by providing ideas for follow-up innovation and product development. 相似文献

7.

Combining evidence with a probabilistic framework for answer ranking and answer merging in question answering

Jeongwoo Ko Luo Si Eric Nyberg 《Information processing & management》2010

Question answering (QA) aims at finding exact answers to a user’s question from a large collection of documents. Most QA systems combine information retrieval with extraction techniques to identify a set of likely candidates and then utilize some ranking strategy to generate the final answers. This ranking process can be challenging, as it entails identifying the relevant answers amongst many irrelevant ones. This is more challenging in multi-strategy QA, in which multiple answering agents are used to extract answer candidates. As answer candidates come from different agents with different score distributions, how to merge answer candidates plays an important role in answer ranking. In this paper, we propose a unified probabilistic framework which combines multiple evidence to address challenges in answer ranking and answer merging. The hypotheses of the paper are that: (1) the framework effectively combines multiple evidence for identifying answer relevance and their correlation in answer ranking, (2) the framework supports answer merging on answer candidates returned by multiple extraction techniques, (3) the framework can support list questions as well as factoid questions, (4) the framework can be easily applied to a different QA system, and (5) the framework significantly improves performance of a QA system. An extensive set of experiments was done to support our hypotheses and demonstrate the effectiveness of the framework. All of the work substantially extends the preliminary research in Ko et al. (2007a). A probabilistic framework for answer selection in question answering. In: Proceedings of NAACL/HLT. 相似文献

8.

基于CIM的相似度综合评价算法

宋欣申安来郭凤媛钟杰胡艳君王建林《现代情报》2013,33(3):129-131

相似度计算是自动问答领域里的重要内容。为了保证候选答案集中各答案能具备合理的排序,解决传统自动问答系统不能高效的综合评价相似度问题,提出利用综合指数法对关键词相似度、语义相似度等进行综合评价,得到综合相似度。并针对部分候选答案冗余信息过多,不利于答案提取的情况,设计了衰减相似度参数,用来解决句子冗余信息对答案提取的影响。实验结果表明,综合指数法的相似度算法能够有效的提高问答的正确率。相似文献

9.

自动问答系统设计与实现

王正华韩永国《人天科学研究》2014,(9):111-113

自动问答系统在搜索引擎的基础上融入了自然语言的知识与应用,与传统的依靠关键字匹配的搜索引擎相比,能够更好地满足用户的检索需求。介绍了计算机操作系统自动问答系统模型,阐述了具体开发过程,设计并实现了基于计算机操作系统领域的自动问答系统,实践表明该系统能够较为准确地回答用户问题。相似文献

10.

Improving graph-based random walks for complex question answering using syntactic,shallow semantic and extended string subsequence kernels

Yllias Chali Sadid A. Hasan Shafiq R. Joty 《Information processing & management》2011

The task of answering complex questions requires inferencing and synthesizing information from multiple documents that can be seen as a kind of topic-oriented, informative multi-document summarization. In generic summarization the stochastic, graph-based random walk method to compute the relative importance of textual units (i.e. sentences) is proved to be very successful. However, the major limitation of the TF^*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. This paper presents the impact of syntactic and semantic information in the graph-based random walk method for answering complex questions. Initially, we apply tree kernel functions to perform the similarity measures between sentences in the random walk framework. Then, we extend our work further to incorporate the Extended String Subsequence Kernel (ESSK) to perform the task in a similar manner. Experimental results show the effectiveness of the use of kernels to include the syntactic and semantic information for this task. 相似文献

11.

Compositional question answering: A divide and conquer approach

Hyo-Jung Oh Ki-Youn Sung Myung-Gil Jang Sung Hyon Myaeng 《Information processing & management》2011

This paper describes how questions can be characterized for question answering (QA) along different facets and focuses on questions that cannot be answered directly but can be divided into simpler ones so that they can be answered directly using existing QA capabilities. Since individual answers are composed to generate the final answer, we call this process as compositional QA. The goal of the proposed QA method is to answer a composite question by dividing it into atomic ones, instead of developing an entirely new method tailored for the new question type. A question is analyzed automatically to determine its class, and its sub-questions are sent to the relevant QA modules. Answers returned from the individual QA modules are composed based on the predetermined plan corresponding to the question type. The experimental results based on 615 questions show that the compositional QA approach outperforms the simple routing method by about 17%. Considering 115 composite questions only, the F-score was almost tripled from the baseline. 相似文献

12.

Improving relational similarity measurement using symmetries in proportional word analogies

Danushka Bollegala Tomokazu Goto Nguyen Tuan Duc Mitsuru Ishizuka 《Information processing & management》2013

Measuring the similarity between the semantic relations that exist between words is an important step in numerous tasks in natural language processing such as answering word analogy questions, classifying compound nouns, and word sense disambiguation. Given two word pairs (A, B) and (C, D), we propose a method to measure the relational similarity between the semantic relations that exist between the two words in each word pair. Typically, a high degree of relational similarity can be observed between proportional analogies (i.e. analogies that exist among the four words, A is to B such as C is to D). We describe eight different types of relational symmetries that are frequently observed in proportional analogies and use those symmetries to robustly and accurately estimate the relational similarity between two given word pairs. We use automatically extracted lexical-syntactic patterns to represent the semantic relations that exist between two words and then match those patterns in Web search engine snippets to find candidate words that form proportional analogies with the original word pair. We define eight types of relational symmetries for proportional analogies and use those as features in a supervised learning approach. We evaluate the proposed method using the Scholastic Aptitude Test (SAT) word analogy benchmark dataset. Our experimental results show that the proposed method can accurately measure relational similarity between word pairs by exploiting the symmetries that exist in proportional analogies. The proposed method achieves an SAT score of 49.2% on the benchmark dataset, which is comparable to the best results reported on this dataset. 相似文献

13.

Combining semantic information in question answering systems

Paloma Moreda Hector Llorens Estela Saquete Manuel Palomar 《Information processing & management》2011

相似文献

14.

User simulations for evaluating answers to question series

Jimmy Lin 《Information processing & management》2007

Recently, question series have become one focus of research in question answering. These series are comprised of individual factoid, list, and “other” questions organized around a central topic, and represent abstractions of user–system dialogs. Existing evaluation methodologies have yet to catch up with this richer task model, as they fail to take into account contextual dependencies and different user behaviors. This paper presents a novel simulation-based methodology for evaluating answers to question series that addresses some of these shortcomings. Using this methodology, we examine two different behavior models: a “QA-styled” user and an “IR-styled” user. Results suggest that an off-the-shelf document retrieval system is competitive with state-of-the-art QA systems in this task. Advantages and limitations of evaluations based on user simulations are also discussed. 相似文献

15.

Be flexible! learn to debias by sampling and prompting for robust visual question answering

《Information processing & management》2023,60(3):103296

Recent studies point out that VQA models tend to rely on the language prior in the training data to answer the questions, which prevents the VQA model from generalization on the out-of-distribution test data. To address this problem, approaches are designed to reduce the language distribution prior effect by constructing negative image–question pairs, while they cannot provide the proper visual reason for answering the question. In this paper, we present a new debiasing framework for VQA by Learning to Sample paired image–question and Prompt for given question (LSP). Specifically, we construct the negative image–question pairs with certain sampling rate to prevent the model from overly relying on the visual shortcut content. Notably, question types provide a strong hint for answering the questions. We utilize question type to constrain the sampling process for negative question–image pairs, and further learn the question type-guided prompt for better question comprehension. Extensive experiments on two public benchmarks, VQA-CP v2 and VQA v2, demonstrate that our model achieves new state-of-the-art results in overall accuracy, i.e., 61.95% and 65.26%. 相似文献

16.

Information gain and divergence-based feature selection for machine learning-based text categorization

Changki Lee Gary Geunbae Lee 《Information processing & management》2006

Most previous works of feature selection emphasized only the reduction of high dimensionality of the feature space. But in cases where many features are highly redundant with each other, we must utilize other means, for example, more complex dependence models such as Bayesian network classifiers. In this paper, we introduce a new information gain and divergence-based feature selection method for statistical machine learning-based text categorization without relying on more complex dependence models. Our feature selection method strives to reduce redundancy between features while maintaining information gain in selecting appropriate features for text categorization. Empirical results are given on a number of dataset, showing that our feature selection method is more effective than Koller and Sahami’s method [Koller, D., & Sahami, M. (1996). Toward optimal feature selection. In Proceedings of ICML-96, 13th international conference on machine learning], which is one of greedy feature selection methods, and conventional information gain which is commonly used in feature selection for text categorization. Moreover, our feature selection method sometimes produces more improvements of conventional machine learning algorithms over support vector machines which are known to give the best classification accuracy. 相似文献

17.

Why users keep answering questions in online question answering communities: A theoretical and empirical investigation

Xiao-Ling Jin Zhongyun Zhou Matthew K.O. Lee Christy M.K. Cheung 《International Journal of Information Management》2013

This study theorized and validated a model of knowledge sharing continuance in a special type of online community, the online question answering (Q&A) community, in which knowledge exchange is reflected mainly by asking and answering specific questions. We created a model that integrated knowledge sharing factors and knowledge self-efficacy into the expectation confirmation theory. The hypotheses derived from this model were empirically validated using an online survey conducted among users of a famous online Q&A community in China, “Yahoo! Answers China”. The results suggested that users’ intention to continue sharing knowledge (i.e., answering questions) was directly influenced by users’ ex-post feelings as consisting of two dimensions: satisfaction, and knowledge self-efficacy. Based on the obtained results, we also found that knowledge self-efficacy and confirmation mediated the relationship between benefits and satisfaction. 相似文献

18.

Biomedical extractive question answering based on dynamic routing and answer voting

《Information processing & management》2023,60(4):103367

Many existing biomedical extractive question answering methods are based on pre-trained models, which do not take full advantage of the hidden layer knowledge of pretrained models and do not consider span overlap between answers when predicting. To address these issues, we propose a new question answering model, called ALBERT with Dynamic Routing and Answer Voting (ADRAV). The ADRAV can reasonably utilize hidden layer knowledge through dynamic routing, and consider span similarity between answers through answer voting. To improve the performance of the model, we also carry out pre-fine-tuning, and add a dynamic parameter adjustment mechanism in the process of pre-fine-tuning. Experimental results show that our model achieves significant performance improvement with fewer parameters on BioASQ 4b, 5b, 6b, 9b, and outperforms SOTA baselines on BioASQ 4b, 6b. 相似文献

19.

Open domain question answering using Wikipedia-based knowledge model

Pum-Mo Ryu Myung-Gil Jang Hyun-Ki Kim 《Information processing & management》2014

相似文献

20.

面向社区问答的中文短文本分类算法研究

赵辉刘怀亮《现代情报》2013,33(10):70-74

为解决社区问答系统中的问题短文本特征词少、描述信息弱的问题,本文利用维基百科进行特征扩展以辅助中文问题短文本分类。首先通过维基百科概念及链接等信息进行词语相关概念集合抽取,并综合利用链接结构和类别体系信息进行概念间相关度计算。然后以相关概念集合为基础进行特征扩展以补充文本特征语义信息。实验结果表明,本文提出的基于特征扩展的短文本分类算法能有效提高问题短文本分类效果。相似文献