首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
郭海红  李姣  代涛 《情报工程》2016,2(6):039-049
本文旨在构建一个中文健康问句分类方法,并通过对高血压相关的健康问句进行人工分类标注,分析公众的高血压相关健康信息需求,同时为研发高血压相关的智能中文问答系统提供语料基础。本研究基于临床问句分类及公众健康信息查询场景层次模型,构建一个四级中文健康问句主题分类方法,并由5位标注员独立地对从某中文健康网站上收集的将近10万条高血压相关提问数据中随机抽取的2000条样本数据进行人工分类标注,以优化和测试该问句分类方法的可靠性,构建标注语料库,并分析公众的高血压相关健康信息需求。5位标注员使用该分类方法进行独立标注的四级类目评判者间信度kappa值为0.63,意味着分类结果可靠,一级大类获得高度一致性(kappa=0.82),略优于国际上的同类研究。分布在治疗、诊断、健康生活方式、临床发现/病情管理、流行病学、择医六个一级类别中的问句分别占样本总量的48.1%、23.8%、11.9%、5.2%、9.0%和1.9%。所构建的健康问句分类方法可用于组织大型健康问题集,以提高检索效率;分类标注的样本问句可作为高血压相关健康问句自动分类研究的语料;得出的高血压相关健康问句主题分布有助于指导健康网站的知识资源建设。此外,所设计和采用的问句分类方法构建方式、语料标注流程、评判者间信度测量方法等,也可为开放领域及其他受限领域开展用户问句分类与语料构建提供借鉴。  相似文献   

2.
Many questions submitted to Collaborative Question Answering (CQA) sites have similar questions answered before. We propose a precise approach of automatically finding an answer to such questions by automatically identifying “equivalent” questions submitted and answered, in the past. Our method is based on automatically generating equivalent question patterns by grouping together questions that have previously obtained the same answers. The generated patterns are used as seed patterns to match more questions to extract large number of equivalent patterns by a new bootstrapping-based learning method. The resulting patterns can be applied to match a new question to an equivalent one that has already been answered, and thus suggest potential answers automatically. We experimented with this approach over a large collection of more than 200,000 real questions drawn from the Yahoo! Answers archive, automatically acquiring over 16,991 groups of equivalent question patterns. These patterns allow our method to obtain over 57% recall and over 54% precision on suggesting an answer automatically to new questions, significantly improving over baseline methods.  相似文献   

3.
深入分析联合虚拟参考咨询系统(CVRS)分布式两级架构模式和咨询问题的处理流程,提出表单问题智能解答、自动应答机器人、知识库自动查重、实时咨询问题自动转表单咨询问题、从知识库批量提取FAQ问题和知识库自动分类等6项CVRS智能优化解决方案,并设计出以中文分词技术为核心,实现知识库全文检索和自动分类、实时交流记录和知识库内容文本摘要的技术路线。
  相似文献   

4.
网上参考咨询借助网络解答用户的问题,服务方式主要有BBS、FAQ、电话咨询、E-m ail咨询、表单咨询、实时咨询等。四川省高校图书馆中提供BBS咨询方式的数量最多,其次是FAQ服务,近一年开展实时咨询服务的图书馆数量增加迅速。四川省高校图书馆网上参考咨询服务的问题与对策主要有:FAQ加强分类浏览与检索功能;BBS借鉴维基百科,加强规范;利用嵌入式技术提高实时咨询的便捷;利用RSS主动提供信息服务;加强合作,建设联合虚拟参考咨询网。  相似文献   

5.
Analysis of Statistical Question Classification for Fact-Based Questions   总被引:1,自引:0,他引:1  
Question classification systems play an important role in question answering systems and can be used in a wide range of other domains. The goal of question classification is to accurately assign labels to questions based on expected answer type. Most approaches in the past have relied on matching questions against hand-crafted rules. However, rules require laborious effort to create and often suffer from being too specific. Statistical question classification methods overcome these issues by employing machine learning techniques. We empirically show that a statistical approach is robust and achieves good performance on three diverse data sets with little or no hand tuning. Furthermore, we examine the role different syntactic and semantic features have on performance. We find that semantic features tend to increase performance more than purely syntactic features. Finally, we analyze common causes of misclassification error and provide insight into ways they may be overcome.  相似文献   

6.
张宁  朱礼军 《情报工程》2016,2(1):032-042
自动问答系统成为近年来自然语言处理领域的研究热点,问句分析作为问答系统的首要环节,在问答系统中起着关键的作用.简要介绍了中文问句分析的基本内容,主要包括分词、词性标注以及句法分析的发展;同时也对中文问句分析中问句分类和问句语义分析的研究内容进行了重点介绍;最后,提出中文问句分析面临的一些难点问题以及对未来可能研究方向的一个初步展望.  相似文献   

7.
The need to cluster small text corpora composed of a few hundreds of short texts rises in various applications; e.g., clustering top-retrieved documents based on their snippets. This clustering task is challenging due to the vocabulary mismatch between short texts and the insufficient corpus-based statistics (e.g., term co-occurrence statistics) due to the corpus size. We address this clustering challenge using a framework that utilizes a set of external knowledge resources that provide information about term relations. Specifically, we use information induced from the resources to estimate similarity between terms and produce term clusters. We also utilize the resources to expand the vocabulary used in the given corpus and thus enhance term clustering. We then project the texts in the corpus onto the term clusters to cluster the texts. We evaluate various instantiations of the proposed framework by varying the term clustering method used, the approach of projecting the texts onto the term clusters, and the way of applying external knowledge resources. Extensive empirical evaluation demonstrates the merits of our approach with respect to applying clustering algorithms directly on the text corpus, and using state-of-the-art co-clustering and topic modeling methods.  相似文献   

8.
Given a user question, the goal of a Question Answering (QA) system is to retrieve answers rather than full documents or even best-matching passages, as most Information Retrieval systems currently do. In this paper, we present BRUJA, a QA system for the management of multilingual collections. BRUJ rkstions (English, Spanish and French). The BRUJA architecture is not formed with three monolingual QA systems but instead uses English as Interlingua to make usual QA tasks such as question classifications and answer extractions. In addition, BRUJA uses Cross Language Information Retrieval (CLIR) techniques to retrieve relevant documents from a multilingual collection. On the one hand, we have more documents to find answers from but on the other hand, we are introducing noise into the system because of translations to the Interlingua (English) and the CLIR module. The question is whether the difficulty of managing three languages is worth it or whether a monolingual QA system delivers better results. We report on in-depth experimentation and demonstrate that our multilingual QA system gets better results than its monolingual counterpart whenever it uses good translation resources and, especially, CLIR techniques that are state-of-the-art.  相似文献   

9.
SUMMARY

An analysis of 96 question and answer pairs from the Bayside Library Ask a Librarian Service found that 54 percent of the queries were received from Bayside residents. Forty-seven percent of the e-mail reference questions were classed as research queries. Although only 25.1 percent of the queries were submitted for formal education purposes, all of these were research questions, and took longer than any other category to answer. In 2001, only 6 of the 54 questions submitted were tertiary level questions, but it took a median time of 95 minutes to answer each one. The 24 general interest category questions took a median time of 47.5 minutes to answer, which is almost half the time it took to answer a tertiary level query.

Librarians from three other public libraries in Victoria offering e-mail reference were interviewed, and compared and contrasted with the Bayside Library Service.

Issues of disproportionate labour, the appearance of the passive role of the e-mail reference user, and the wisdom of public libraries devoting significant resources to answer questions for formal education were raised.  相似文献   

10.
Background: Question‐answering systems (or QA Systems) stand as a new alternative for Information Retrieval Systems. Most users frequently need to retrieve specific information about a factual question to obtain a whole document. Objectives: The study evaluates the efficiency of QA systems as terminological sources for physicians, specialised translators and users in general. It assesses the performance of one open‐domain QA system, START, and one restricted‐domain QA system, MedQA. Method: The study collected two hundred definitional questions (What is…?), either general or specialised, from the health website WebMD. Sources used by the open‐domain QA system, START, and the restricted‐domain QA system, MedQA, were studied to retrieve answers, and later a range of evaluation measures (precision, Mean Reciprocal Rank, Total Reciprocal Rank, First Hit Success) were applied to mark the quality of answers. Results: It was established that both systems are useful in the retrieval of valid definitional healthcare information, with an acceptable degree of coherent and precise responses from both. The answers supplied by MedQA were more reliable that those of START in the sense that they came from specialised clinical or academic sources, most of them showing links to further research articles. Conclusions: Results obtained show the potential of this type of tool in the more general realm of information access, and the retrieval of health information. They may be considered a good, reliable and reasonably precise alternative in alleviating the information overload. Both QA systems can help professionals and users can obtain healthcare information.  相似文献   

11.
《The Reference Librarian》2013,54(33):77-102
Limitations on both time and human memory make it impossible for the reference librarian or staff member to become aware of even a fraction of all the reference sources that have been published. There is, however, a small number of basic, fundamental or "key" sources that are widely used or widely recommended. In all likelihood these sources will answer a high proportion of all the questions that may appropriately be answered by published reference materials. This paper explores a number of ways that these "key" reference sources may be identified. The author concludes that a knowledge of the types or categories of reference materials that exist and what each type will do best, along with a knowledge of a corpus of basic, fundemental or "key" reference titles, will contribute to a firm foundation for effective and efficient reference service.  相似文献   

12.
《The Reference Librarian》2013,54(34):141-166
Limitations on both time and human memory make it impossible for the reference librarian or staff member to become aware of even a fraction of all the reference sources that have been published. There is, however, a small number of basic, fundamental or "key" sources that are widely used or widely recommended. In all likelihood these sources will answer a high proportion of all the questions that may appropriately be answered by published reference materials. This paper explores a number of the ways that these "key" reference sources may be identified. The author concludes that a knowledge of the types or categories of reference materials that exist and what each type will do best, along with a knowledge of a corpus of basic, fundamental or "key" reference titles, will contribute to a firm foundation for effective and efficient reference service.  相似文献   

13.
Focused web crawling in the acquisition of comparable corpora   总被引:2,自引:0,他引:2  
Cross-Language Information Retrieval (CLIR) resources, such as dictionaries and parallel corpora, are scarce for special domains. Obtaining comparable corpora automatically for such domains could be an answer to this problem. The Web, with its vast volumes of data, offers a natural source for this. We experimented with focused crawling as a means to acquire comparable corpora in the genomics domain. The acquired corpora were used to statistically translate domain-specific words. The same words were also translated using a high-quality, but non-genomics-related parallel corpus, which fared considerably worse. We also evaluated our system with standard information retrieval (IR) experiments, combining statistical translation using the Web corpora with dictionary-based translation. The results showed improvement over pure dictionary-based translation. Therefore, mining the Web for comparable corpora seems promising.  相似文献   

14.
数字参考咨询的"本地"问题研究-温州大学图书馆案例分析   总被引:3,自引:0,他引:3  
数字参考咨询的本地问题指需要提供本地知识或检索本地资源才能解答的咨询问题。文章通过对温州大学图书馆数字参考咨询问题的案例分析,找出本地问题在咨询总量中所占比重与产生原因。认为咨询馆员利用本地知识解答咨询问题的形式固然不容忽视,但在理想的网页设计前提下,如果拥有必要的本地信息、清晰的网络导航和专业的检索系统,许多本地问题是可由合作咨询组织加盟成员馆进行异地回复的。  相似文献   

15.
Product reviews have become an important resource for customers before they make purchase decisions. However, the abundance of reviews makes it difficult for customers to digest them and make informed choices. In our study, we aim to help customers who want to quickly capture the main idea of a lengthy product review before they read the details. In contrast with existing work on review analysis and document summarization, we aim to retrieve a set of real-world user questions to summarize a review. In this way, users would know what questions a given review can address and they may further read the review only if they have similar questions about the product. Specifically, we design a two-stage approach which consists of question selection and question diversification. For question selection phase, we first employ probabilistic retrieval models to locate candidate questions that are relevant to a given review. A Recurrent Neural Network Encoder–Decoder is utilized to measure the “answerability” of questions to a review. We then design a set function to re-rank the questions with the goal of rewarding diversity in the final question set. The set function satisfies submodularity and monotonicity, which results in an efficient greedy algorithm of submodular optimization. Evaluation on product reviews from two categories shows that the proposed approach is effective for discovering meaningful questions that are representative of individual reviews.  相似文献   

16.
When people are connected together over ad hoc social networks, it is possible to ask questions and retrieve answers using the wisdom of the crowd. However, locating a suitable candidate for answering a specific unique question within larger ad hoc groups is non-trivial, especially if we wish to respect the privacy of users by providing deniability. All members of the network wish to source the best possible answers from the network, while at the same time controlling the levels of attention required to generate them by the collective group of individuals and/or the time taken to read all the answers. Conventional expert retrieval approaches rank users for a given query in a centralised indexing process, associating users with material they have previously published. Such an approach is antithetical to privacy, so we have looked to distribute the routing of questions and answers, converting the indexing process into one of building a forwarding table. Starting from the simple operation of flooding the question to everyone, we compare a number of different routing options, where decisions must be made based on past performance and exploitation of the knowledge of our immediate neighbours. We focus on fully decentralised protocols using ant-inspired tactics to route questions towards members of the network who may be able to answer them well. Simultaneously, privacy concerns are acknowledged by allowing both question asking and answering to be plausibly deniable. We have found that via our routing method, it is possible to improve answer quality and also reduce the total amount of user attention required to generate those answers.  相似文献   

17.
[目的/意义]旨在构建社会化问答社区用户生成答案质量评价指标体系,实现面向用户需求的答案质量自动化评价和筛选,提高社会化问答社区知识服务质量。[方法/过程]引入社会情感特征和用户特征,运用因子分析和结构方程实证构建用户生成答案质量评价指标体系。基于GA-BP神经网络模型设计答案质量自动化评价方法。最后,选取知乎网站数据对用户生成答案质量评价指标体系和自动化评价方法进行应用研究。[结果/结论]构建包含答案文本特征、回答者特征、时效特征、用户特征、社会情感特征5个维度的评价指标体系。实验分析发现基于GA-BP神经网络的答案质量自动化评价方法相比于其他方法准确率较高、平均误差低,具有可行性和有效性,能够进一步应用和推广实践。  相似文献   

18.
[目的/意义]在线问答社区成为互联网用户获取高质量知识的重要途径,探索中文问答社区答案质量对知识传播具有重要意义。[方法/过程]以规模最大的中文问答社区之一"知乎"为研究对象,采用数据挖掘和机器学习方法,选取逻辑回归、支持向量机和随机森林三种分类模型,进行三层递进式训练和检验。从结构化特征、文本特征以及用户社交属性三个维度构建答案质量的特征体系。[结果/结论]实验结果显示,随着特征体系的不断丰富,三种分类模型的性能逐步提升;而随机森林作为一种组合分类模型,在全量特征的情况下,取得出色的分类性能。对特征组合分析发现,包含用户社交属性的随机森林总是比同等级的其它模型更加出色,表明社会化网络在答案质量评价中的地位。研究结论表明从答案本身和答案编写者两个角度能够评价答案质量,构建的特征体系和模型可以较为全面地预测答案质量。  相似文献   

19.
This paper is concerned with Markov processes for computing page importance. Page importance is a key factor in Web search. Many algorithms such as PageRank and its variations have been proposed for computing the quantity in different scenarios, using different data sources, and with different assumptions. Then a question arises, as to whether these algorithms can be explained in a unified way, and whether there is a general guideline to design new algorithms for new scenarios. In order to answer these questions, we introduce a General Markov Framework in this paper. Under the framework, a Web Markov Skeleton Process is used to model the random walk conducted by the web surfer on a given graph. Page importance is then defined as the product of two factors: page reachability, the average possibility that the surfer arrives at the page, and page utility, the average value that the page gives to the surfer in a single visit. These two factors can be computed as the stationary probability distribution of the corresponding embedded Markov chain and the mean staying time on each page of the Web Markov Skeleton Process respectively. We show that this general framework can cover many existing algorithms including PageRank, TrustRank, and BrowseRank as its special cases. We also show that the framework can help us design new algorithms to handle more complex problems, by constructing graphs from new data sources, employing new family members of the Web Markov Skeleton Process, and using new methods to estimate these two factors. In particular, we demonstrate the use of the framework with the exploitation of a new process, named Mirror Semi-Markov Process. In the new process, the staying time on a page, as a random variable, is assumed to be dependent on both the current page and its inlink pages. Our experimental results on both the user browsing graph and the mobile web graph validate that the Mirror Semi-Markov Process is more effective than previous models in several tasks, even when there are web spams and when the assumption on preferential attachment does not hold.  相似文献   

20.
Background: Finding evidence to answer clinical questions is essential to the practice of evidence‐based medicine (EBM). However, practising EBM in primary care is thought to be problematic because of concerns about whether evidence exists to answer specific questions. Objectives: To determine the highest level of evidence per question; to ascertain the number of questions unanswered because of a lack of evidence; to establish the frequency with which guidelines answered questions; and to determine the domain of websites used to answer questions. Methods: Clinical questions were identified from two primary care answering services: ATTRACT and National Library for Health (NLH) Primary Care Answering Service. The types of evidence used to answer the question were noted, including whether this was from systematic reviews or meta‐analyses (level one evidence) or from randomised controlled trials (level two). The data were collected from March to June 2008. Results: Level 1 or level 2 evidence answered 11% of questions. Sixteen per cent were unanswered because of a lack of evidence. Over 40% of questions were answered using guidelines. Forty‐three per cent of questions were answered with one type of evidence and 24% with two. Conclusion: Guidelines are useful resources for primary care clinicians, answering two‐fifths of questions.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号