共查询到20条相似文献,搜索用时 15 毫秒
1.
[目的/意义]资源数字化时代文献服务向知识服务方向转变,高质量的文献自动标引是文献知识服务能力提升的基础和关键,针对目前英文科技文献自动标引准确率不高的问题,提出了基于语义感知的概念遴选优化方法。[方法/过程]基于知识组织系统的自动主题标引,采用自然语言处理中的神经网络词向量技术,对概念和英文文献内容语义进行表示并进行语义感知与评估,实现概念标引结果在语义层面的遴选。该方法采用基于知识组织系统与自然语言处理技术相结合的方法,弥补了在语义层面上的不足,从而进一步降低不相关概念的影响,提高概念标引结果的准确率。[结果/结论]实验结果表明,该方法具有较好的语义感知性能,在概念遴选上有效降低了不相关概念,大大提高了标引结果的文献相关性,为科技文献资源知识化服务建设和相关研究提供有价值的参考和支持。 相似文献
2.
3.
4.
[目的/意义]基于文本挖掘技术自动发现更具代表性的文献内容主题词,通过定位主题词在章节中的具体位置,并基于可视化技术进行主题标引,帮助读者直观高效发现文献主题间的潜在关系。[方法/过程]基于文本挖掘技术深入文献内容层挖掘主题词,并利用可视化工具直观呈现所获信息,在此基础上尝试构建可视化主题自动标引系统,并在格萨尔领域的多个主题中对该系统的自动标引效果进行验证。[结果/结论]研究结果显示,该标引方法在格萨尔领域实现了文献内容级的可视化主题自动标引,快速精准地定位到章节、段落和句子。标引相关信息获取过程直观可视,并且具有交互性,可提升用户体验和参与度。文章以《英雄格萨尔》为例完成系统验证,但该标引方法技术本身无领域限定,可应用于其他领域的文献。 相似文献
5.
6.
本文采用数据挖掘技术和情报语言学方法 ,构建了一个可以用于从因特网上提取信息、进行自动标引和自动分类的系统 ,提供了一种创建自动分类知识库的新方法 ;提出了一种用于主题抽取的位置加权算法 ,研制了一种改进汉语同义词识别性能的新方法 ,并在自动分类时运用了这种语义相似度识别算法。最后还对该系统性能进行了测试 相似文献
7.
一个基于本体论全文自动标引方案 总被引:5,自引:1,他引:5
本文为支持数字图书馆全文检索精度的提高,提出了一个基于本体论全文自动标引方案。该方案利用本体论的方法,强调词与词之间的内在概念联系,着重解决传统的人工标引不能全面概括全文,而且词与词之间缺乏概念性的连接,很难反映文件主题的全面内容及由于多义词、同义词等的原因造成漏检或检索结果返回信息太多,失去检索意义,达不到理想效果的问题。并为数字图书馆在进行主题标引时实现自动化操作。 相似文献
8.
Chun-Yan Liang Li GuoZhao-Jie Xia Feng-Guang NieXiao-Xia Li Liang SuZhang-Yuan Yang 《Information processing & management》2006
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-related information more exactly from web pages. After automatic segmentation on the documents to find dictionary terms for document expansion, the approach adopts latent semantic indexing (LSI) to produce the final document vectors, and the relevant categories are finally assigned to the test document by using the k-NN text categorization algorithm. The effects of the characteristics of chemistry dictionary and test collection on the categorization efficiency are discussed in this paper, and a new voting method is also introduced to improve the categorization performance further based on the collection characteristics. The experimental results show that the proposed approach has the superior performance to the traditional categorization method and is applicable to the classification of chemical web pages. 相似文献
9.
[目的/意义]针对技术功效图构建过程中的主要问题和薄弱环节,提出了一种基于SAO结构和词向量的专利技术功效图构建方法。[方法/过程]利用Python程序获取专利摘要中的SAO结构,从中识别技术词和功效词;结合领域词典与专利领域语料库,运用Word2Vec和WordNet计算词语间的语义相似度;利用基于网络关系的主题聚类算法实现主题的自动标引;采用基于SAO结构的共现关系构建技术功效矩阵。[结果/结论]实现了基于SAO结构和词向量的技术功效图自动构建,该构建方法提高了构建技术功效主题的合理性和专利分类标注的准确性,为技术功效图的自动化构建提供新的思路。 相似文献
10.
一种智能型的信息检索方法:隐含语义索引法 总被引:3,自引:0,他引:3
介绍了一种新的自动索引和检索方法——隐含语义索引法。隐含语义索引法是一种全自动的智能索引方法,通过挖掘文本与词汇之间的隐含关系来达到提高检索效率的目的。 相似文献
11.
A comparative evaluation has been carried out on the Philips “DIRECT” and the British “INSPEC” retrieval system. DIRECT is based on automatic indexing whereas INSPEC uses manual subject indexing.Two queries were submitted to both systems, using the same data base. The results are expressed in terms of recall and precision. Both recall and precision of INSPEC were found to be higher than those of DIRECT by 20%. It is concluded that this is mainly a result of the query formulation. The effectiveness obtained with automatic indexing of documents is equivalent to that of the manual procedure. 相似文献
12.
《Information processing & management》2001,37(2):255-277
Does human intellectual indexing have a continuing role to play in the face of increasingly sophisticated automatic indexing techniques? In this two-part essay, a computer scientist and long-time TREC participant (Pérez-Carballo) and a practitioner and teacher of human cataloging and indexing (Anderson) pursue this question by reviewing the opinions and research of leading experts on both sides of this divide. We conclude that human analysis should be used on a much more selective basis, and we offer suggestions on how these two types indexing might be allocated to best advantage. Part I of the essay critiques the comparative research, then explores the nature of human analysis of messages or texts and efforts to formulate rules to make human practice more rigorous and predictable. We find that research comparing human versus automatic approaches has done little to change strongly held beliefs, in large part because many associated variables have not been isolated or controlled.Part II focuses on current methods in automatic indexing, its gradual adoption by major indexing and abstracting services, and ways for allocating human and machine approaches. Overall, we conclude that both approaches to indexing have been found to be effective by researchers and searchers, each with particular advantages and disadvantages. However, automatic indexing has the over-arching advantage of decreasing cost, as human indexing becomes ever more expensive. 相似文献
13.
《Information processing & management》2001,37(2):231-254
Does human intellectual indexing have a continuing role to play in the face of increasingly sophisticated automatic indexing techniques? In this two-part essay, a computer scientist and long-time TREC participant (Pérez-Carballo) and a practitioner and teacher of human cataloging and indexing (Anderson) pursue this question by reviewing the opinions and research of leading experts on both sides of this divide. We conclude that human analysis should be used on a much more selective basis, and we offer suggestions on how these two types of indexing might be allocated to best advantage. Part one of the essay critiques the comparative research, then explores the nature of human analysis of messages or texts and efforts to formulate rules to make human practice more rigorous and predictable. We find that research comparing human vs automatic approaches has done little to change strongly held beliefs, in large part because many associated variables have not been isolated or controlled.Part II focuses on current methods in automatic indexing, its gradual adoption by major indexing and abstracting services, and ways for allocating human and machine approaches. Overall, we conclude that both approaches to indexing have been found to be effective by researchers and searchers, each with particular advantages and disadvantages. However automatic indexing has the over-arching advantage of decreasing cost, as human indexing becomes ever more expensive. 相似文献
14.
自动标引技术的回顾与展望 总被引:4,自引:0,他引:4
本文论述了在目前全文检索广泛应用的背景下,自动标引的重要性;把近五十年发展起来的自动标引技术按照采用的理论依据,分为统计分析方法、语言分析方法、人工智能法和混合方法,并阐述了每类自动标引技术的特征及其优劣势;最后,总结分析了现有自动标引技术的不足,并对其发展前景做出展望。 相似文献
15.
Determining requirements when searching for and retrieving relevant information suited to a user’s needs has become increasingly important and difficult, partly due to the explosive growth of electronic documents. The vector space model (VSM) is a popular method in retrieval procedures. However, the weakness in traditional VSM is that the indexing vocabulary changes whenever changes occur in the document set, or the indexing vocabulary selection algorithms, or parameters of the algorithms, or if wording evolution occurs. The major objective of this research is to design a method to solve the afore-mentioned problems for patent retrieval. The proposed method utilizes the special characteristics of the patent documents, the International Patent Classification (IPC) codes, to generate the indexing vocabulary for presenting all the patent documents. The advantage of the generated indexing vocabulary is that it remains unchanged, even if the document sets, selection algorithms, and parameters are changed, or if wording evolution occurs. Comparison of the proposed method with two traditional methods (entropy and chi-square) in manual and automatic evaluations is presented to verify the feasibility and validity. The results also indicate that the IPC-based indexing vocabulary selection method achieves a higher accuracy and is more satisfactory. 相似文献
16.
17.
《Information processing & management》2005,41(5):1065-1080
Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment. 相似文献
18.
19.