首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
In this paper we evaluate the application of data fusion or meta-search methods, combining different algorithms and XML elements, to content-oriented retrieval of XML structured data. The primary approach is the combination of a probabilistic methods using Logistic regression and the Okapi BM-25 algorithm for estimation of document relevance or XML element relevance, in conjunction with Boolean approaches for some query elements. In the evaluation we use the INEX XML test collection to examine the relative performance of individual algorithms and elements and compare these to the performance of the data fusion approaches.  相似文献   

2.
Content-only queries in hierarchically structured documents should retrieve the most specific document nodes which are exhaustive to the information need. For this problem, we investigate two methods of augmentation, which both yield high retrieval quality. As retrieval effectiveness, we consider the ratio of retrieval quality and response time; thus, fast approximations to the 'correct' retrieval result may yield higher effectiveness. We present a classification scheme for algorithms addressing this issue, and adopt known algorithms from standard document retrieval for XML retrieval. As a new strategy, we propose incremental-interruptible retrieval, which allows for instant presentation of the top ranking documents. We develop a new algorithm implementing this strategy and evaluate the different methods with the INEX collection.  相似文献   

3.
Most recent document standards like XML rely on structured representations. On the other hand, current information retrieval systems have been developed for flat document representations and cannot be easily extended to cope with more complex document types. The design of such systems is still an open problem. We present a new model for structured document retrieval which allows computing scores of document parts. This model is based on Bayesian networks whose conditional probabilities are learnt from a labelled collection of structured documents—which is composed of documents, queries and their associated assessments. Training these models is a complex machine learning task and is not standard. This is the focus of the paper: we propose here to train the structured Bayesian Network model using a cross-entropy training criterion. Results are presented on the INEX corpus of XML documents.  相似文献   

4.
基于域加权词频法的XML文档级检索实现与评价   总被引:1,自引:0,他引:1  
利用BM25F模型,通过实验,在INEX 04数据集的基础上,实现了对多个域(元素)词频进行加权的XML文档级检索。XML文档结构的确蕴含了一定的语义信息。利用这些语义信息,可以提高检索性能。表2。图1。参考文献16。  相似文献   

5.
XML信息检索探究   总被引:4,自引:0,他引:4  
廖述梅  万常选  徐升华 《情报学报》2007,381(2):229-234
XML文档是具有层次结构和文本内容的半结构化数据。现有的Web信息检索是基于HTML文档的关键词全文检索,无法胜任XML元素粒度的检索;同时,XML数据库检索实现的是精确查找,检索结果无排序支持。因此,融合信息检索和数据库技术研究XML检索问题成为必然。本文从XML检索的问题域出发,阐述了XML信息检索(XML IR)的国内外研究现状与特点,并分析了目前XML IR的热点和难点问题。  相似文献   

6.
一种基于Native XML的全文检索引擎   总被引:5,自引:0,他引:5  
王弘蔚  肖诗斌 《情报学报》2003,22(5):550-556
随着XML的日益流行 ,基于XML的全文检索应用需求也迅速扩大。在这些应用中 ,native XML数据库是发展方向。虽然商业化的native XML数据库已经出现 ,但其全文检索的性能还不尽人意。本文提出一种方法 :在传统的倒排索引的框架下 ,对XML的标记建立索引 ,使得一个全文数据库能够以Native的方式存储、索引、检索和输出XML文档 ,成为一个真正意义上的native XML全文数据库 ,既有传统全文数据库的优越性能 ,又能满足基于na tive XML的应用需求  相似文献   

7.
XML检索系统及其比较研究*   总被引:2,自引:0,他引:2  
探讨XML检索与传统信息检索的区别、XML检索的目标与任务以及XML检索系统研究的核心问题,并对现有的几个XML检索系统进行介绍和比较研究。  相似文献   

8.
Evaluating the effectiveness of content-oriented XML retrieval methods   总被引:1,自引:0,他引:1  
Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should be retrieved. In this article, we show that the evaluation methods developed for standard retrieval must be modified in order to deal with the structure of XML documents. More precisely, the size and overlap of document components must be taken into account. For this purpose, we propose a new effectiveness metric based on the definition of a concept space defined upon the notions of exhaustiveness and specificity of a search result. We compare the results of this new metric by the results obtained with the official metric used in INEX, the evaluation initiative for content-oriented XML retrieval.
Gabriella KazaiEmail:
  相似文献   

9.
研究利用XML文本片段和图像的内容特征(颜色)实现图像的检索。基于XML多媒体数字图书馆检索系统平台WHU-XML,对XML文本和图像构建索引,并在此基础上,采用线性归并法,实现基于XML文本片段的图像检索和基于图像内容特征(颜色)检索的结合。研究结果表明,当文本检索权重大于图像内容检索的权重时,检索效果比只采用单一检索方式时好。  相似文献   

10.
基于XML的分布式信息检索   总被引:1,自引:0,他引:1  
提出了一种对互联网信息进行分布式信息检索的方法:利用代理程序和XML技术向多个相同类型的网站同时发送检索请求文档并接收它们返回的检索结果文档,经过统一处理后将检索结果显示给读者  相似文献   

11.
Documents formatted in eXtensible Markup Language (XML) are available in collections of various document types. In this paper, we present an approach for the summarisation of XML documents. The novelty of this approach lies in that it is based on features not only from the content of documents, but also from their logical structure. We follow a machine learning, sentence extraction-based summarisation technique. To find which features are more effective for producing summaries, this approach views sentence extraction as an ordering task. We evaluated our summarisation model using the INEX and SUMMAC datasets. The results demonstrate that the inclusion of features from the logical structure of documents increases the effectiveness of the summariser, and that the learnable system is also effective and well-suited to the task of summarisation in the context of XML documents. Our approach is generic, and is therefore applicable, apart from entire documents, to elements of varying granularity within the XML tree. We view these results as a step towards the intelligent summarisation of XML documents.
Mounia LalmasEmail:
  相似文献   

12.
TIJAH: Embracing IR Methods in XML Databases   总被引:1,自引:0,他引:1  
This paper discusses our participation in INEX (the Initiative for the Evaluation of XML Retrieval) using the TIJAH XML-IR system. TIJAHs system design follows a standard layered database architecture, carefully separating the conceptual, logical and physical levels. At the conceptual level, we classify the INEX XPath-based query expressions into three different query patterns. For each pattern, we present its mapping into a query execution strategy. The logical layer exploits score region algebra (SRA) as the basis for query processing. We discuss the region operators used to select and manipulate XML document components. The logical algebra expressions are mapped into efficient relational algebra expressions over a physical representation of the XML document collection using the pre-post numbering scheme. The paper concludes with an analysis of experiments performed with the INEX test collection.  相似文献   

13.
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.  相似文献   

14.
INEX是当今信息检索领域最重要的国际评测会议之一.文章通过对INEX 2004年至2010年检索评价项目数量、项目类型、项目任务、测试集的变化,以及对IST 2007年至2010年所关注项目,参与INEX评测的机构的分析,了解XML检索领域的发展方向与趋势,以促进我国科研团队在XML语言检索评价领域更加深入的探索和发展.  相似文献   

15.
INEX 2005 Dagstuhl XML信息检索国际会议述评   总被引:3,自引:0,他引:3  
INEX是致力于XML信息检索与评价研究的国际性论坛,每年都有来自全球各地的专家学者参加其组织的研究活动和学术会议。本文介绍了INEX 2005年年终会议的概况、会议关注的焦点及未来的发展等。  相似文献   

16.
XML retrieval is a departure from standard document retrieval in which each individual XML element, ranging from italicized words or phrases to full blown articles, is a retrievable unit. The distribution of XML element lengths is unlike what we usually observe in standard document collections, prompting us to revisit the issue of document length normalization. We perform a comparative analysis of arbitrary elements versus relevant elements, and show the importance of element length as a parameter for XML retrieval. Within the language modeling framework, we investigate a range of techniques that deal with length either directly or indirectly. We observe a length-bias introduced by the amount of smoothing, and show the importance of extreme length bias for XML retrieval. We also show that simply removing shorter elements from the index (by introducing a cut-off value) does not create an appropriate element length normalization. Even after restricting the minimal size of XML elements occurring in the index, the importance of an extreme explicit length bias remains.  相似文献   

17.
图像对象特征值的抽取、存储、转换、显现的实现有多种方法,SIMIIRS系统主要采用了数据库方法和XML方法。文章主要讨论了图像资源的XML描述方法、建立图像信息的XML索引文档,检索XML文档以实现图像信息查询与提供。  相似文献   

18.
XML及其在图书馆和情报检索中的应用   总被引:40,自引:4,他引:36  
与HTML 和SGML 相比, XML 更适合运用于Web 环境, 用以表达信息的语义和结构。XML 将对Web 产生重大影响, 并影响图书馆参与Web 信息资源组织和整理的方式。对HTML﹑SGML 和XML 进行了比较, 阐述了XML 影响图书馆的诸多因素, 并对用于检索XML 文档的情报检索技术和XML 对情报检索的帮助进行探讨。  相似文献   

19.
INEX与TREC是检索领域的两大检索系统评价平台,在检索技术发展迅速的今天依然保持强大生命力,在当今检索技术评价领域起着十分重要的作用。本篇文章通过对INEX与TREC的研究目标以及平台的构成要素包括三个方面:测试集、检索问题的构造、相关性评估的比较,找出INEX相对于TREC评测平台的创新及不同点,以便更加深入和全面地了解INEX的评测方法。  相似文献   

20.
彭哲 《图书情报工作》2008,52(6):110-110
全文检索系统由三大功能模块组成:索引模块、检索模块和存储模块。本文着重分析系统组成和XML数据库的设计、建立倒排索引文件、中文分词等技术难点。同时在此基础之上建立基于Lucene/XML的期刊文献全文检索系统。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号