首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 156 毫秒
1.
文献标引工作是图书馆、情报机构和档案部门文献信息工作的中心环节,也是编制各类书目、索引,建立数据库和计算机检索系统的基础.自80年代起,我国开始了中文文献的自动标引研究.但由于中文的特殊性,至今仍没有成熟的计算机技术能解决中文的自动分词、文献主题分析、概念提取等问题,使中文自动标引工作仍停留在试验阶段.随着电子信息时代的到来,研究探索在现有的技术条件下,设计一个能在中文标引实际工作中发挥良好作用的计算机辅助标引系统,具有十分重要的意义.  相似文献   

2.
萧莉明  于宽  蔡珣 《现代情报》2007,27(4):146-147,150
本文设计了一个有效的基于贝叶斯分类器的中文期刊自动分类系统。首先,该系统以期刊的名称作为惟一的标引内容,并利用自动分词技术将期刊名称分成待分类的样本集;其次,通过对图书馆的样本数据进行训练建立的分类库,本文使用贝叶斯分类器实现中文期刊的自动分类。实验结果表明,该分类器对中文期刊的分类具有很好的高效性和准确性。  相似文献   

3.
[目的/意义]基于文本挖掘技术自动发现更具代表性的文献内容主题词,通过定位主题词在章节中的具体位置,并基于可视化技术进行主题标引,帮助读者直观高效发现文献主题间的潜在关系。[方法/过程]基于文本挖掘技术深入文献内容层挖掘主题词,并利用可视化工具直观呈现所获信息,在此基础上尝试构建可视化主题自动标引系统,并在格萨尔领域的多个主题中对该系统的自动标引效果进行验证。[结果/结论]研究结果显示,该标引方法在格萨尔领域实现了文献内容级的可视化主题自动标引,快速精准地定位到章节、段落和句子。标引相关信息获取过程直观可视,并且具有交互性,可提升用户体验和参与度。文章以《英雄格萨尔》为例完成系统验证,但该标引方法技术本身无领域限定,可应用于其他领域的文献。  相似文献   

4.
【目的/意义】目前在多文档自动摘要方面,研究者们主要关注于获取多文档集合中的重要主题内容,提出的很多自动摘要方法在提高摘要代表性的同时却忽略了文档中的潜在主题。【方法/过程】针对于多文档自动摘要中存在的冗余度较高且不能全面反映主题内容的问题,本文提出了一种基于句子主题发现的多文档自动摘要方法。该方法将多篇文档转换为句子集合,利用LDA主题模型对句子进行聚类分析与主题发现,并通过word2vec训练词向量计算句子的相似度;最终在主题之下通过TextRank算法来计算句子重要性,并结合句子的统计特征生成多文档集合的摘要。【结果/结论】通过人工测评的结果表明,本文提出的多文档自动摘要方法在主题覆盖性、简洁性、语法性等方面都取得了不错的效果。  相似文献   

5.
曹锦丹  刘鑫 《情报科学》2000,18(3):253-255
本文讨论文献数据库中的知识表达、标引问题,试图将知识工程中的OAV三元组法引入科技项目查新咨询工作中以解决科研主题、成果评审中的创新性评价问题。  相似文献   

6.
黄伟强 《情报科学》1991,12(3):72-75
本文在分析主题检索的历史与现状的基础上,指出:主题检索语言向自然语言和分类主题一体化方向发展;其标引方法也将发生以下变化:①从以叙词为主、关键词为辅的方式过渡到以关键词为主、叙词为辅的方式;②传统的二元标引会发展成相关加权标引;③标引方法趋向简单,繁琐的联符和职符等辅助标引措施将逐渐被淘汰。  相似文献   

7.
[目的/意义]针对技术功效图构建过程中的主要问题和薄弱环节,提出了一种基于SAO结构和词向量的专利技术功效图构建方法。[方法/过程]利用Python程序获取专利摘要中的SAO结构,从中识别技术词和功效词;结合领域词典与专利领域语料库,运用Word2Vec和WordNet计算词语间的语义相似度;利用基于网络关系的主题聚类算法实现主题的自动标引;采用基于SAO结构的共现关系构建技术功效矩阵。[结果/结论]实现了基于SAO结构和词向量的技术功效图自动构建,该构建方法提高了构建技术功效主题的合理性和专利分类标注的准确性,为技术功效图的自动化构建提供新的思路。  相似文献   

8.
文章介绍自动标引技术的发展现状,并将自动标引技术应用于政府信息公开的标引工作中,针对政府信息公开工作中存在的问题和不足,运用统计加权算法,将词频统计、位置加权、词共现统计三者相结合,设计实现了基于关键词的政府信息公开的自动标引。  相似文献   

9.
从关键词与高频词的相关度看自动标引的可行性   总被引:1,自引:0,他引:1  
本文通过基于词频统计的内容分析法,将从文本中抽取的高频词与关键词进行匹配对比;根据抽样实验的结果分析了二者的相关度,并以此为基础论证了文献主题自动标引的可行性;结果证明当高频词取到第3位时便能与人工标引的关键词达到一半以上的匹配,在取到7位时便能在85%的程度上替代人工标引.  相似文献   

10.
文章提出了科技论文关键词的战略图分析方法,从论文作者关键词、机器标引关键词和标题摘要中抽取的关键词中选择关键词,以消除标引效应,通过聚类将关键词划分为研究主题簇,计算研究主题簇的向心度指标和密度指标,绘制战略图,将研究主题簇分为4类,据此分析问题领域现状;将数据分为若干阶段,分别形成战略图,通过计算相邻阶段的主题簇的相似度指标、起源指标和影响指标,了解研究主题变迁和相互关系。实验证明了战略图分析方法的有效性。  相似文献   

11.
栗久珍 《现代情报》2007,27(12):136-137,140
本文提出一种基于网络论坛FAQ(Frequently Asked Questions)生成的情报收集系统的结构.并讨论了研发系统涉及的关键技术。在此基础上设计了系统性能评估实验。实验结果表明,研究的网络论坛情报收集系统能够有效地实现情报收集工作。  相似文献   

12.
关于开展FAQ信息服务的思考   总被引:11,自引:0,他引:11  
方宝花 《情报科学》2005,23(1):83-85
具有知识导航、传播、整合、增值与发射功能的FAQ信息服务是实现图书馆参考咨询工作由被动服务向主动服务转移的新模式、新途径,网络环境下的参考咨询工作应在加强FAQ基础资源建设、建立FAQ知识库、密切其与实时解答的联系、提供直捷方便的检索途径、深化服务内容与层次、建立用户个人的FAQ上下功夫。  相似文献   

13.
Traditional Information Retrieval (IR) models assume that the index terms of queries and documents are statistically independent of each other, which is intuitively wrong. This paper proposes the incorporation of the lexical and syntactic knowledge generated by a POS-tagger and a syntactic Chunker into traditional IR similarity measures for including this dependency information between terms. Our proposal is based on theories of discourse structure by means of the segmentation of documents and queries into sentences and entities. Therefore, we measure dependencies between entities instead of between terms. Moreover, we handle discourse references for each entity. It has been evaluated on Spanish and English corpora as well as on Question Answering tasks obtaining significant increases.  相似文献   

14.
To obtain high performances, previous works on FAQ retrieval used high-level knowledge bases or handcrafted rules. However, it is a time and effort consuming job to construct these knowledge bases and rules whenever application domains are changed. To overcome this problem, we propose a high-performance FAQ retrieval system only using users’ query logs as knowledge sources. During indexing time, the proposed system efficiently clusters users’ query logs using classification techniques based on latent semantic analysis. During retrieval time, the proposed system smoothes FAQs using the query log clusters. In the experiment, the proposed system outperformed the conventional information retrieval systems in FAQ retrieval. Based on various experiments, we found that the proposed system could alleviate critical lexical disagreement problems in short document retrieval. In addition, we believe that the proposed system is more practical and reliable than the previous FAQ retrieval systems because it uses only data-driven methods without high-level knowledge sources.  相似文献   

15.
王群芳 《科教文汇》2011,(23):65-66
本文运用语体学方法对现代汉族婚礼从业人员的婚礼用语进行分析。论文研究的目的在于通过对婚礼用语的语料研究和分析,发现中国汉族婚礼用语语法方面的特点。笔者发现主要特点有:整句与散句并重,短句多、长句少。  相似文献   

16.
The authors of this paper investigate terms of consumers’ diabetes based on a log from the Yahoo!Answers social question and answers (Q&A) forum, ascertain characteristics and relationships among terms related to diabetes from the consumers’ perspective, and reveal users’ diabetes information seeking patterns. In this study, the log analysis method, data coding method, and visualization multiple-dimensional scaling analysis method were used for analysis. The visual analyses were conducted at two levels: terms analysis within a category and category analysis among the categories in the schema. The findings show that the average number of words per question was 128.63, the average number of sentences per question was 8.23, the average number of words per response was 254.83, and the average number of sentences per response was 16.01. There were 12 categories (Cause & Pathophysiology, Sign & Symptom, Diagnosis & Test, Organ & Body Part, Complication & Related Disease, Medication, Treatment, Education & Info Resource, Affect, Social & Culture, Lifestyle, and Nutrient) in the diabetes related schema which emerged from the data coding analysis. The analyses at the two levels show that terms and categories were clustered and patterns were revealed. Future research directions are also included.  相似文献   

17.
学术文献中包含的大量有价值的知识往往无法在摘要中体现出来。本文提出一种基于位置加权的核心知识挖掘方法,旨在以句为知识处理粒度,抽取正文中的核心句子作为独立的知识单元。该方法通过量化句子间的关联,将正文表示成一个以句子为节点,句子间关联为边的文本关系网络,提出基于章节的位置加权算法,结合社会网络分析方法,挖掘出文本中核心知识单元部分的句子。实验结果表明,该方法可以实现对文章核心章节中重要句子的抽取,达到初步预期效果。  相似文献   

18.
A challenge for sentence categorization and novelty mining is to detect not only when text is relevant to the user’s information need, but also when it contains something new which the user has not seen before. It involves two tasks that need to be solved. The first is identifying relevant sentences (categorization) and the second is identifying new information from those relevant sentences (novelty mining). Many previous studies of relevant sentence retrieval and novelty mining have been conducted on the English language, but few papers have addressed the problem of multilingual sentence categorization and novelty mining. This is an important issue in global business environments, where mining knowledge from text in a single language is not sufficient. In this paper, we perform the first task by categorizing Malay and Chinese sentences, then comparing their performances with that of English. Thereafter, we conduct novelty mining to identify the sentences with new information. Experimental results on TREC 2004 Novelty Track data show similar categorization performance on Malay and English sentences, which greatly outperform Chinese. In the second task, it is observed that we can achieve similar novelty mining results for all three languages, which indicates that our algorithm is suitable for novelty mining of multilingual sentences. In addition, after benchmarking our results with novelty mining without categorization, it is learnt that categorization is necessary for the successful performance of novelty mining.  相似文献   

19.
Aspect-based sentiment analysis aims to determine sentiment polarities toward specific aspect terms within the same sentence or document. Most recent studies adopted attention-based neural network models to implicitly connect aspect terms with context words. However, these studies were limited by insufficient interaction between aspect terms and opinion words, leading to poor performance on robustness test sets. In addition, we have found that robustness test sets create new sentences that interfere with the original information of a sentence, which often makes the text too long and leads to the problem of long-distance dependence. Simultaneously, these new sentences produce more non-target aspect terms, misleading the model because of the lack of relevant knowledge guidance. This study proposes a knowledge guided multi-granularity graph convolutional neural network (KMGCN) to solve these problems. The multi-granularity attention mechanism is designed to enhance the interaction between aspect terms and opinion words. To address the long-distance dependence, KMGCN uses a graph convolutional network that relies on a semantic map based on fine-tuning pre-trained models. In particular, KMGCN uses a mask mechanism guided by conceptual knowledge to encounter more aspect terms (including target and non-target aspect terms). Experiments are conducted on 12 SemEval-2014 variant benchmarking datasets, and the results demonstrated the effectiveness of the proposed framework.  相似文献   

20.
FAQ由于其全面、快捷、精确的特性受到许多应用领域的青睐,它在图书馆的应用也日益得到重视。本文着重介绍FAQ的应用流程并对其分析改进,从数据库、候选问句集、算法、面向用户等方面提出一些实际性的建议,为FAQ的发展及前景奠定了基础。FAQ的前景是广阔的,却也是需要继续深化的,本文大致从虚拟咨询、My FAQ、综合型与专题型FAQ、人性化、最新消息等角度来展望FAQ的发展前景,继而描绘了改进后的流程。相信FAQ的发展日益发散。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号