首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The Defense Documentation Center (DDC), a field activity of the Defense Supply Agency, implemented an automated indexing procedure in October 1973. This Machine-Aided Indexing (MAI) System [1] had been under development since 1969. The following is a report of several comparisons designed to measure the retrieval effectiveness of MAI and manual indexing procedures under normal operational conditions.Several definitions are required in order to clarify the MAI process as it pertains to these investigations. The MAI routines scan unedited text in the form of titles and abstracts. The output of these routines is called Candidate Index Terms. These word strings are matched by computer against an internal file of manually screened and cross-referenced terms called a Natural Language Data Base (NLDB). The NLDB differs from a standard thesaurus in that there is no related term category. Word strings which match the NLDB are accepted as valid MAI output. The mismatches are manually screened for suitability. Those accepted are added to the NLDB. If now, the original set of Candidate Index Terms is matched against the updated NLDB, the matched output is unedited MAI. If both the unedited matches and mismatches are further structured in accession order and sent to technical analysts for review, the output of that process is called edited MAI.The tests were designed to (a) compare unedited MAI with manual indexing, holding the indexing language and the retrieval technique constant; (b) compare edited MAI with unedited MAI, holding both the indexing and the retrieval technique constant; and (c) compare two different retrieval techniques, called simple and complex, while holding the indexing constant.  相似文献   

2.
3.
高维索引技术是基于内容的图像检索中的一项关键技术。本文分析了图像检索中索引技术的研究现状,对现有的索引方法进行了分类、比较和评价,最后对存在的问题和发展方向进行了探讨。  相似文献   

4.
This article presents the human evaluation of ILIAD, a program for machine-aided indexing (MAI). It consists of two language engineering modules and is designed to assist expert librarians in computer-aided indexing and document analysis. Our aim is the expert evaluation of automatic multi-word term indexing. Evaluation is performed by documentary engineers. Cataloging and indexing are their principal tasks. They also have a good scientific knowledge of the domain to which the indexed documents belong.We first present the ILIAD program and the two systems submitted to this evaluation, the methodology (protocol) adopted, the differences between the protocol and the implementation, and the results of these evaluations. Human evaluation is divided into three parts: firstly the evaluation of controlled indexing, then free indexing and finally term variant extraction performed during controlled indexing. Finally, we analyze the relevance of this evaluation by calculating the agreement frequency and the Kappa coefficient and propose some future developments.  相似文献   

5.
冷伏海 《情报科学》2002,20(3):285-289
本文综述了索引图像的领域和范围、相关工作、图像系统及其工作、索引图像的方式、图像的属性、基于概念的索引、基于内容的索引及其系统的和图像检索中的浏览等问题。  相似文献   

6.
徐震 《现代情报》2006,26(10):149-150,175
本文首先分析了传统主题检索系统的弊端,然后提出了针对这些弊端的优化技术,包括全文主题词标引、概念标引、检索式语义分析、模式匹配等相关技术,使用这些技术可以发挥主题检索语言和自然语言检索各自的优点,使传统主题检索系统成为智能化,高层次的检索系统.  相似文献   

7.
大量图像信息的产生使得基于内容的图像检索技术成为研究热点.由于颜色特征具有稳定性和计算简单的特点,本文首先介绍了利用全局颜色直方图进行图像检索的基本思想,然后分析了它的局限性,并给出了改进的方法:特征提取采用结合空间信息的颜色一致向量方法.在特征度量时,依据所设计的评价实验,对这两种方法进行了比较,并给出了实验结果和图像检索性能的评价.实验表明,所述的图像检索方法具有较好的查全率和查准率.  相似文献   

8.
靖培栋  宋雯斐 《情报科学》2006,24(6):884-887
本文探讨了在基于关键词索引的中文全文检索系统中实现各种截词检索的方法,建立了关键词索引的Hash索引,这种方法即能节省内存又提高检索效率。  相似文献   

9.
基于本体的文本信息检索研究   总被引:5,自引:0,他引:5  
本文对如何构建基于本体的文本信息检索系统进行了探讨.并认为,利用反映概念之间关系的领域本体指导主题标引,利用反映实体之间关系的领域本体指导实体关系标引,并以本体的形式表示文档替代物和查询表达式,可以进一步提高文本信息检索系统的性能。  相似文献   

10.
This paper describes a technique for automatic book indexing. The technique requires a dictionary of terms that are to appear in the index, along with all text strings that count as instances of the term. It also requires that the text be in a form suitable for processing by a text formatter. A program searches the text for each occurrence of a term or its associated strings and creates an entry to the index when either is found. The results of the experimental application to a portion of a book text are presented, including measures of precision and recall, with precision giving the ratio of terms correctly assigned in the automatic process to the total assigned, and recall giving the ratio of correct terms automatically assigned to the total number of term assignments according to a human standard. Results indicate that the technique can be applied successfully, especially for texts that employ a technical vocabulary and where there is a premium on indexing exhaustivity.  相似文献   

11.
A variety of abstract automatic indexing models have been developed in recent times in an effort to produce indexing methods that are both effective and usable in practice. Among these are the term discrimination model and the term precision system. These two indexing systems are briefly described and experimental evidence is cited showing that a combination of both theories produces better retrieval performance than either one alone. Appropriate conclusions are reached concerning viable automatic indexing procedures usable in practice.  相似文献   

12.
CNKI主题标引分析   总被引:2,自引:0,他引:2  
现今网络数据库中文献量日益增大,用户使用量日渐膨胀,需求也愈发急切。怎样准确的提供给用户所需文献成为人们非常重视的问题。对于期刊论文来说,主题标引的高质量是准确提供给用户所需文献的前提与关键。本文选定信息管理学科的6个主题词在CNKI中进行主题检索,通过分析检索结果来评价CNKI的主题标引质量,分析原因并提出改进建议。  相似文献   

13.
Whereas in language words of high frequency are generally associated with low content [Bookstein, A., & Swanson, D. (1974). Probabilistic models for automatic indexing. Journal of the American Society of Information Science, 25(5), 312–318; Damerau, F. J. (1965). An experiment in automatic indexing. American Documentation, 16, 283–289; Harter, S. P. (1974). A probabilistic approach to automatic keyword indexing. PhD thesis, University of Chicago; Sparck-Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21; Yu, C., & Salton, G. (1976). Precision weighting – an effective automatic indexing method. Journal of the Association for Computer Machinery (ACM), 23(1), 76–88], shallow syntactic fragments of high frequency generally correspond to lexical fragments of high content [Lioma, C., & Ounis, I. (2006). Examining the content load of part of speech blocks for information retrieval. In Proceedings of the international committee on computational linguistics and the association for computational linguistics (COLING/ACL 2006), Sydney, Australia]. We implement this finding to Information Retrieval, as follows. We present a novel automatic query reformulation technique, which is based on shallow syntactic evidence induced from various language samples, and used to enhance the performance of an Information Retrieval system. Firstly, we draw shallow syntactic evidence from language samples of varying size, and compare the effect of language sample size upon retrieval performance, when using our syntactically-based query reformulation (SQR) technique. Secondly, we compare SQR to a state-of-the-art probabilistic pseudo-relevance feedback technique. Additionally, we combine both techniques and evaluate their compatibility. We evaluate our proposed technique across two standard Text REtrieval Conference (TREC) English test collections, and three statistically different weighting models. Experimental results suggest that SQR markedly enhances retrieval performance, and is at least comparable to pseudo-relevance feedback. Notably, the combination of SQR and pseudo-relevance feedback further enhances retrieval performance considerably. These collective experimental results confirm the tenet that high frequency shallow syntactic fragments correspond to content-bearing lexical fragments.  相似文献   

14.
In a typical inverted-file full-text document retrieval system, the user submits queries consisting of strings of characters combined by various operators. The strings are looked up in a text-dictionary which lists, for each string, all the places in the database at which it occurs. It is desirable to allow the user to include in his query truncated terms such as X1, 1X, 1X1, or X1Y, where X and X are specified strings and 1 is a variable-length-don't-care character, that is, 1 represents an arbitrary, possibly empty, string. Processing these terms involves finding the set of all words in the dictionary that match these patterns. How to do this efficiently is a long-standing open problem in this domain.In this paper we present a uniform and efficient approach for processing all such query terms. The approach, based on a “permuted dictionary” and a corresponding set of access routines, requires essentially one disk access to obtain from the dictionary all the strings represented by a truncated term, with negligible computing time. It is thus well suited for on-line applications. Implementation is simple, and storage overhead is low: it can be made almost negligible by using some specially adapted compression techniques described in the paper.The basic approach is easily adaptable for slight variants, such as fixed (or bounded) length don't-care characters, or more complex pattern matching templates.  相似文献   

15.
基于关联理论的信息检索相关性研究——信息生产、标引   总被引:1,自引:0,他引:1  
文摘:在Saracevic以及Harter研究的基础上,提出了将语言学中的关联理论作为相关性研究的理论基础,并利用关联理论具体阐释了信息检索交互模型中的信息生产以及信息标引两项工作。  相似文献   

16.
PERMDEX is a microcomputer program to assist in the creation of a permuted printed index which preserves the context of indexing paraphrases. Although much simpler than PRECIS, the microcomputer program was inspired by it, and uses role operators to permute terms through lead, qualifier, and display positions. Following a discussion of derivative vs assignment indexing, the use of roles, and the concept behind PRECIS, features of the program are described including indexer input and prompts, the shunting algorithm, and sorting and printing routines.  相似文献   

17.
曾洪京 《情报杂志》1993,12(4):58-61
评介了60年代以来比较著名的几种引文标引理论,并在此基础上提出了“无标引的引文检索”方法。作者利用该方法的基本原理进行了局部试验,结果显示良好。认为如果将引文标引检索与主题标引检索结合起来,可更好地提高情报检索效率。  相似文献   

18.
一种基于本体的语义标引方法   总被引:4,自引:0,他引:4  
传统的采用主题词和关键词对文档进行标引的方法,由于不能提供语义推理而越来越不适合目前的网络环境。由于本体具有良好的概念层次结构和对逻辑推理的支持,在信息检索领域将有很大的应用价值。本文首先介绍本体的基本概念和领域本体的组成部分,然后提出了一种基于领域本体的语义标引方法,采用本体中的概念对文档进行语义层面的标引,为检索的智能推理提供基础。  相似文献   

19.
Determining requirements when searching for and retrieving relevant information suited to a user’s needs has become increasingly important and difficult, partly due to the explosive growth of electronic documents. The vector space model (VSM) is a popular method in retrieval procedures. However, the weakness in traditional VSM is that the indexing vocabulary changes whenever changes occur in the document set, or the indexing vocabulary selection algorithms, or parameters of the algorithms, or if wording evolution occurs. The major objective of this research is to design a method to solve the afore-mentioned problems for patent retrieval. The proposed method utilizes the special characteristics of the patent documents, the International Patent Classification (IPC) codes, to generate the indexing vocabulary for presenting all the patent documents. The advantage of the generated indexing vocabulary is that it remains unchanged, even if the document sets, selection algorithms, and parameters are changed, or if wording evolution occurs. Comparison of the proposed method with two traditional methods (entropy and chi-square) in manual and automatic evaluations is presented to verify the feasibility and validity. The results also indicate that the IPC-based indexing vocabulary selection method achieves a higher accuracy and is more satisfactory.  相似文献   

20.
An integrated information retrieval system generally contains multiple databases that are inconsistent in terms of their content and indexing. This paper proposes a rough set-based transfer (RST) model for integration of the concepts of document databases using various indexing languages, so that users can search through the multiple databases using any of the current indexing languages. The RST model aims to effectively create meaningful transfer relations between the terms of two indexing languages, provided a number of documents are indexed with them in parallel. In our experiment, the indexing concepts of two databases respectively using the Thesaurus of Social Science (IZ) and the Schlagwortnormdatei (SWD) are integrated by means of the RST model. Finally, this paper compares the results achieved with a cross-concordance method, a conditional probability based method and the RST model.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号