首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 140 毫秒
1.
语义检索能克服传统的基于关键词匹配检索的缺点,是信息检索的发展趋势。本文主要探讨两种实现语义检索的索引:潜语义索引和其修正形式。首先介绍了潜语义索引的基本思想和检索过程,并在分析潜语义索引的不足的基础上,介绍了其修正形式———残差迭代变换。  相似文献   

2.
夏立新  庄青青  陈卓群 《情报科学》2007,25(9):1378-1383
XML文档的置标语义信息舜口结构化特点,使检索更易于实现,且能改善检索时的查准率。本文利用二叉排序树为XML文档建立索引文件,给出了建立索引的数据结构舜口算法,并分析了二叉排序树索引在改善XML文档的数据更新,检索速度及查准率等方面的优势。  相似文献   

3.
王宇 《情报探索》2013,(8):105-107
阐述分布式表征的意义,认为分布式表征可以按语义提取出相似的词并抓住语境。介绍分布式表征的语义关系、随机索引和评价方法等机制。讨论分布式表征的使用,即随机检索如何有效检索相关文本文件。  相似文献   

4.
引文索引评述   总被引:1,自引:0,他引:1  
王冰 《情报科学》1999,17(2):200-201,216
引文索引是建立在文献引证关系之上的一种新型检索工具,本文在评述了引文索引结构原理的基础上,概述了国际上权威检索工具之一的《科学引文索引》的特点,简介了填补我国文献引文索引系统空白的《中国科学引文索引》的概况,指出了科学引文索引系统是统计、分析和评价科技论文的重要检索工具,是对科技论文进行科学定量分析所依赖的权威数据资源。  相似文献   

5.
信息环境的异构性、动态性与海量性使传统基于自然文本的信息检索方法与技术面临极大挑战,集成概念空间理论与潜在语义索引技术能为这种困境提供一些解决方案.在分析概念空间内涵与特征的基础上,利用潜在语义索引原理讨论了概念提取方法、同义词近义词处理方法及基准向量的生成方法,分析了网络条件下基于概念空间的文本分类、聚类检索基本机制,最后给出了完善概念空间的自学习机制.  相似文献   

6.
图像检索系统中关键技术   总被引:2,自引:0,他引:2  
刘俊熙 《情报杂志》2004,23(7):93-94
图像检索系统主要可分成基于文本和基于内容的两大系统。文本本身就可以说明所要讲的内容,检索技术相对容易。而图像包括视觉特征与语义特征,关键技术涉及存储技术、索引技术、检索技术、视频信息的处理技术等。  相似文献   

7.
数据库的索引路径优化选择是实现对Deep Web数据库的深度访问和安全访问的关键。传统方法中对Deep Web数据库的路径选择采用关键字搜索方法,根据关键字罗列出所有可能复合信息的数据,当出现歧义特征时,数据索引准确度不高。提出一种基于语义高斯边缘化的数据库索引路径选择方法。构建Deep Web数据库的特征模型,计算节点与关键词的匹配度,得到高斯边缘化路径控制目标函数,把语义相似度分解为用户查询意图的相关性指向函数,实现高斯边缘化路径控制。将数据库的预测控制指令输入序列进行变量耦合加权,与邻阶跨层链路进行均衡处理,设置语义高斯边缘化索引复激活函数,提高对数据库的索引性能,实现路径优选。仿真结果表明,该算法能提高数据库的查准率,减少查询时间,实现对Deep Web数据库高效安全访问。  相似文献   

8.
纹理谱直方图与潜在语义标引在图像检索中的应用   总被引:2,自引:0,他引:2  
提出一种新的描述局域像素灰度变化的纹理谱方法,并将该方法抽取出的纹理谱直方图特征用于图像检索。实验表明该特征计算简单,检索结果较好,是一种适于图像检索的底层特征。同时将应用于文本检索领域的潜在语义标引方法用于图像检索,并提出了一种计算近似矩阵秩的方法。实验结果显示使用该方法可以分析图像中存在的隐含语义结构,并提高检索效率。  相似文献   

9.
靖培栋  宋雯斐 《情报科学》2006,24(6):884-887
本文探讨了在基于关键词索引的中文全文检索系统中实现各种截词检索的方法,建立了关键词索引的Hash索引,这种方法即能节省内存又提高检索效率。  相似文献   

10.
从常用数据库索引机制出发讨论常见几种索引类型如:B*树索引、组合索引、基于函数的索引,然后结合人事档案的管理工作中的常用检索方法,讨论了在一定的情况下选择哪种索引机制实现人事档案的快捷查询与维护,进而加快高校人事档案的信息化建设进程。  相似文献   

11.
Rocchio relevance feedback and latent semantic indexing (LSI) are well-known extensions of the vector space model for information retrieval (IR). This paper analyzes the statistical relationship between these extensions. The analysis focuses on each method’s basis in least-squares optimization. Noting that LSI and Rocchio relevance feedback both alter the vector space model in a way that is in some sense least-squares optimal, we ask: what is the relationship between LSI’s and Rocchio’s notions of optimality? What does this relationship imply for IR? Using an analytical approach, we argue that Rocchio relevance feedback is optimal if we understand retrieval as a simplified classification problem. On the other hand, LSI’s motivation comes to the fore if we understand it as a biased regression technique, where projection onto a low-dimensional orthogonal subspace of the documents reduces model variance.  相似文献   

12.
Determining requirements when searching for and retrieving relevant information suited to a user’s needs has become increasingly important and difficult, partly due to the explosive growth of electronic documents. The vector space model (VSM) is a popular method in retrieval procedures. However, the weakness in traditional VSM is that the indexing vocabulary changes whenever changes occur in the document set, or the indexing vocabulary selection algorithms, or parameters of the algorithms, or if wording evolution occurs. The major objective of this research is to design a method to solve the afore-mentioned problems for patent retrieval. The proposed method utilizes the special characteristics of the patent documents, the International Patent Classification (IPC) codes, to generate the indexing vocabulary for presenting all the patent documents. The advantage of the generated indexing vocabulary is that it remains unchanged, even if the document sets, selection algorithms, and parameters are changed, or if wording evolution occurs. Comparison of the proposed method with two traditional methods (entropy and chi-square) in manual and automatic evaluations is presented to verify the feasibility and validity. The results also indicate that the IPC-based indexing vocabulary selection method achieves a higher accuracy and is more satisfactory.  相似文献   

13.
A new dictionary-based text categorization approach is proposed to classify the chemical web pages efficiently. Using a chemistry dictionary, the approach can extract chemistry-related information more exactly from web pages. After automatic segmentation on the documents to find dictionary terms for document expansion, the approach adopts latent semantic indexing (LSI) to produce the final document vectors, and the relevant categories are finally assigned to the test document by using the k-NN text categorization algorithm. The effects of the characteristics of chemistry dictionary and test collection on the categorization efficiency are discussed in this paper, and a new voting method is also introduced to improve the categorization performance further based on the collection characteristics. The experimental results show that the proposed approach has the superior performance to the traditional categorization method and is applicable to the classification of chemical web pages.  相似文献   

14.
In image retrieval, most systems lack user-centred evaluation since they are assessed by some chosen ground truth dataset. The results reported through precision and recall assessed against the ground truth are thought of as being an acceptable surrogate for the judgment of real users. Much current research focuses on automatically assigning keywords to images for enhancing retrieval effectiveness. However, evaluation methods are usually based on system-level assessment, e.g. classification accuracy based on some chosen ground truth dataset. In this paper, we present a qualitative evaluation methodology for automatic image indexing systems. The automatic indexing task is formulated as one of image annotation, or automatic metadata generation for images. The evaluation is composed of two individual methods. First, the automatic indexing annotation results are assessed by human subjects. Second, the subjects are asked to annotate some chosen images as the test set whose annotations are used as ground truth. Then, the system is tested by the test set whose annotation results are judged against the ground truth. Only one of these methods is reported for most systems on which user-centred evaluation are conducted. We believe that both methods need to be considered for full evaluation. We also provide an example evaluation of our system based on this methodology. According to this study, our proposed evaluation methodology is able to provide deeper understanding of the system’s performance.  相似文献   

15.
单汉字索引是中文全文检索索引技术中一个主要方法,此方法在索引的空问和检索的效率方面都存在不足。本文引入单元词索引,并分析试验数据,表明引入单元词索引后,索引的空间效率和检索的时间效率均有提高。  相似文献   

16.
网页自动标引方案的优选及标引性能的测评   总被引:2,自引:0,他引:2  
仲云云  侯汉清  薛鹏军 《情报科学》2002,20(10):1108-1110
本文介绍了三种网页自动标引方案,通过对“中国经济网”上50页网页的手工标引、自动标引结果比较,从而优选出一种方案,即对网页全文不同部位加权,采用词频加权统计法。最后对该方案自动主题标引和分类标引分别从人机相符率方面进行测评。  相似文献   

17.
18.
This paper proposes a method to improve retrieval performance of the vector space model (VSM) in part by utilizing user-supplied information of those documents that are relevant to the query in question. In addition to the user's relevance feedback information, information such as original document similarities is incorporated into the retrieval model, which is built by using a sequence of linear transformations. High-dimensional and sparse vectors are then reduced by singular value decomposition (SVD) and transformed into a low-dimensional vector space, namely the space representing the latent semantic meanings of words. The method has been tested with two test collections, the Medline collection and the Cranfield collection. In order to train the model, multiple partitions are created for each collection. Improvement of average precision of the averages over all partitions, compared with the latent semantic indexing (LSI) model, are 20.57% (Medline) and 22.23% (Cranfield) for the two training data sets, and 0.47% (Medline) and 4.78% (Cranfield) for the test data, respectively. The proposed method provides an approach that makes it possible to preserve user-supplied relevance information for the long term in the system in order to use it later.  相似文献   

19.
陈立华 《现代情报》2010,30(3):26-28,31
潜在语义分析是自然语言使用于情报检索系统的理论基础,以此理论建构的空间向量模型是评判检索系统性能优良与否的知识工具。阐述了潜在语义标引(LSI)的基本内容、LSI下影响自然语言检索查准率的因素及向量空间模型检索软件的运行机制。此评述对网络化的情报检索技术的发展起到了一定的参考作用。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号