首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 62 毫秒
1.
This paper relates to the difficulty in retrieving precise information from big repositories of magazine articles in full text, and proposes an Extended Markup Language (XML) vocabulary for improving retrieval rates. The hypothesis tested was as follows: Magazine articles marked up with an XML vocabulary, indexed only by selected parts, give more precise search results than the same search using full text index.

The study was exploratory with the following characteristics: 29 magazine articles were tested for results, 8 scholars were interviewed for defining 23 search strategies and evaluating results. The data showed that precision improved from 40.72% with full text search to 62.84% using XML markup and searching only in specific labels.

Revision of the vocabulary and more testing has to be done by the library and information science community in order to obtain a valid vocabulary and provide more research results. Cultural characteristics and politics of librarians and information managers’ community are as important as technical issues in order to consider any technical proposal to be implemented successfully to achieve interoperability.  相似文献   

2.
一种基于Native XML的全文检索引擎   总被引:5,自引:0,他引:5  
王弘蔚  肖诗斌 《情报学报》2003,22(5):550-556
随着XML的日益流行 ,基于XML的全文检索应用需求也迅速扩大。在这些应用中 ,native XML数据库是发展方向。虽然商业化的native XML数据库已经出现 ,但其全文检索的性能还不尽人意。本文提出一种方法 :在传统的倒排索引的框架下 ,对XML的标记建立索引 ,使得一个全文数据库能够以Native的方式存储、索引、检索和输出XML文档 ,成为一个真正意义上的native XML全文数据库 ,既有传统全文数据库的优越性能 ,又能满足基于na tive XML的应用需求  相似文献   

3.
This paper investigates the impact of three approaches to XML retrieval: using Zettair, a full-text information retrieval system; using eXist, a native XML database; and using a hybrid system that takes full article answers from Zettair and uses eXist to extract elements from those articles. For the content-only topics, we undertake a preliminary analysis of the INEX 2003 relevance assessments in order to identify the types of highly relevant document components. Further analysis identifies two complementary sub-cases of relevance assessments (General and Specific) and two categories of topics (Broad and Narrow). We develop a novel retrieval module that for a content-only topic utilises the information from the resulting answer list of a native XML database and dynamically determines the preferable units of retrieval, which we call Coherent Retrieval Elements. The results of our experiments show that—when each of the three systems is evaluated against different retrieval scenarios (such as different cases of relevance assessments, different topic categories and different choices of evaluation metrics)—the XML retrieval systems exhibit varying behaviour and the best performance can be reached for different values of the retrieval parameters. In the case of INEX 2003 relevance assessments for the content-only topics, our newly developed hybrid XML retrieval system is substantially more effective than either Zettair or eXist, and yields a robust and a very effective XML retrieval.  相似文献   

4.
XML信息检索探究   总被引:4,自引:0,他引:4  
廖述梅  万常选  徐升华 《情报学报》2007,381(2):229-234
XML文档是具有层次结构和文本内容的半结构化数据。现有的Web信息检索是基于HTML文档的关键词全文检索,无法胜任XML元素粒度的检索;同时,XML数据库检索实现的是精确查找,检索结果无排序支持。因此,融合信息检索和数据库技术研究XML检索问题成为必然。本文从XML检索的问题域出发,阐述了XML信息检索(XML IR)的国内外研究现状与特点,并分析了目前XML IR的热点和难点问题。  相似文献   

5.
基于XML的全文检索原型系统的设计与实现*   总被引:1,自引:0,他引:1  
针对当前单位网站搜索引擎存在的索引速度慢、更新不及时、检索效率低等问题,在深入分析和研究Lucene和XML等技术在建立搜索引擎方面优越性能的基础上,构建一个基于XML的全文检索原型系统。该系统以XML作为通用数据接口,以Lucene作为实现平台,能够实现快速及时索引和提高检索效率的目的。  相似文献   

6.
This study aimed to examine whether a graphical abstract (GA) on a publisher's official website affected an article's usage and citations. Articles published in Molecules during 2016 (n = 1389) and 2017 (n = 1804) were selected as the data sets. Propensity score matching analysis was conducted to examine the data sets. The results showed that articles with GAs had significantly greater text abstract usage than those without GAs. There were no significant differences between the two groups (articles with/without GAs) on the usage of the full text as well as citations, both in the data set of 2016 and the data set of 2017. Our study concluded that a GA played a role in attracting attention with ‘the first impressions’ to gain more clicks of text abstract for an article, however, it took no advantage in receiving more usage of full text and citations of the article.  相似文献   

7.
XML搜索引擎研究   总被引:1,自引:0,他引:1  
首先分析传统搜索引擎查准率不高的原因,然后介绍XML以及XML搜索引擎研究现状,并对XML搜索引擎所涉及的文档存储、索引、查询等关键技术进行详尽探讨。在此基础上,设计现行网络环境下的XML搜索引擎模型。认为该模型可充分利用XML文档的DTD模式信息,并能大幅度提高查询的准确率。  相似文献   

8.
设计并实现一个专利信息获取分析的原型系统。通过概念检索的方式扩展某一领域专利检索词,提高检索性能,同时利用XML解析技术,准确高效地抽取出检索结果页面中的专利文本信息,并将社会网络分析方法应用于专利引文分析之中。  相似文献   

9.
两大全文期刊数据库的特色、分析测试及评价   总被引:3,自引:0,他引:3  
对重庆维普《中文科技期刊数据库》及清华同方《中国期刊全文数据库(CJFD)》对检索方法、速度、获得方法以及收录范围进行比较分析,为图书馆及读者合理选择和使用全文数据库提供参考。  相似文献   

10.
Minnesota’s Foundations Project is a multiagency collaboration to improve access to environmental and natural resources information. The Project chose the Dublin Core metadata standard for web resources. Three studies were conducted: needs assessment, Bridges web site user interface, and usability of controlled vocabulary in Dublin Core metadata. Based on these findings and information architecture, the Project published best practice guidelines. Controlled vocabulary is important to facilitate access. This is relevant to the third study on Dublin Core metadata, which tested keyword searches of web pages to determine the effectiveness of controlled vocabulary in the Dublin Core subject tag. Central to the Best Practice Guidelines is the User Guide to Dublin Core, which offers an element-by-element understanding of the metadata schema. Current bibliographies and reports show further background work that informed the decision-making process for such important choices as metadata schema, thesaurus and thesaurus management software, search engine, and RDF/XML standards.  相似文献   

11.
调查近年国内有关“网络信息资源管理”课题的论述,了解我国信息管理学界在该领域的研究比重、核心期刊分布及机构组织分布。重点揭示“网络信息资源管理”各分支领域的研究进展和关注焦点,并推荐各相关领域的优秀论文。  相似文献   

12.
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.  相似文献   

13.
介绍OAI机制中,网络信息传播模式的球体结构整体形式及其各个阶段的不同形式,并进一步探讨了球体结构中OAI传播体系及其传播体系中各个要素之间的关系,最后提出建立基于DP/DS基础上的ADP/ADS;为了准确快捷地下载全文,提出在OAI网络环境下,对XML中DC元数据超链接的进行统一标准化,并且举出XML的DC实例加以说明。  相似文献   

14.
研究利用XML文本片段和图像的内容特征(颜色)实现图像的检索。基于XML多媒体数字图书馆检索系统平台WHU-XML,对XML文本和图像构建索引,并在此基础上,采用线性归并法,实现基于XML文本片段的图像检索和基于图像内容特征(颜色)检索的结合。研究结果表明,当文本检索权重大于图像内容检索的权重时,检索效果比只采用单一检索方式时好。  相似文献   

15.
关于图书馆服务弱势群体问题的研究与反思   总被引:26,自引:0,他引:26  
弱势群体是源自社会学领域的一个概念,在图书馆服务领域,作为图书馆服务对象的弱势群体是“由于各种原因,不能利用或在利用传统和现代数字化图书馆服务上存在困难的一切群体。”弱势群体进入图书馆服务领域具有特别的意义,但是在这一词汇出现于图书馆研究文献之前,图书馆已经在为这个词所指称的某些群体,如老年群体、残疾群体等提供着服务。因而,目前的研究者不应将“弱势群体”作为一个时髦概念,而应分门别类的深入研究,避免大而化之;建立真正适合的馆藏资源、服务环境;开展需求调查和评估;针对本地区的实际情况采取相应的服务策略。  相似文献   

16.
《现代图书情报技术》评介   总被引:4,自引:3,他引:1  
1980~1998年,《现代图书情报技术》共出版85期,发文1026篇,期均发文12.1篇,篇均 5.1页.在1026篇发文中,59.5%的发文有引文,篇均引文2.9条,期刊自引率9.4%。作者合著率35.2%。发文8篇和8篇以上的作者有10人。经过多年努力,已形成明显的办刊特色,为中国图书馆学优秀期刊、情报学核心期刊。  相似文献   

17.
《Public Library Quarterly》2013,32(3-4):139-168
Abstract

Presently, eating disorders affect millions of people. Today, the media and the Internet are major publishing channels for consumer health information. Much research has found that the media may offer insufficient information on eating disorders. This research examines the quantity of adult consumer health publications on eating disorders in magazine articles and on the Internet in 1998 and investigates the effectiveness of Internet search engines. The results indicate that the resources found in popular magazines and on the Internet are not adequate, and that the Internet search engines are not effective in searching for information.  相似文献   

18.
Controlled vocabulary and subject headings in OPAC records have proven to be useful in improving search results. The authors used a survey to gather information about librarian opinions and professional use of controlled vocabulary. Data from a range of backgrounds and expertise were examined, including academic and public libraries, and technical services as well as public services professionals. Responses overall demonstrated positive opinions of the value of controlled vocabulary, including in reference interactions as well as during bibliographic instruction sessions. Results are also examined based upon factors such as age and type of librarian.  相似文献   

19.
基于Lucene的Ftp搜索引擎的设计   总被引:2,自引:0,他引:2  
针对当前网络中所使用的基于数据库的Ftp搜索引擎没有标准资源文档且不支持中文分词和动态数据更新的缺陷,提出基于Lucene这个功能强大的全文索引引擎工具包的Ftp搜索引擎的设计方案。此Ftp搜索引擎不仅能够自动生成标准格式的XML资源文档,而且采用基于字典的前向最大匹配中文分词法在Lucene中动态更新全文索引。该设计还能够对检索关键字进行中英文混合分析和检索。  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号