首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 750 毫秒
1.
基于文档结构的向量空间检索模型研究   总被引:9,自引:0,他引:9  
韩毅 《情报学报》2004,23(2):158-162
分析了传统向量空间检索模型在网络信息检索中的不足 ,给出了基于文档结构的向量空间检索模型。该模型将文档在逻辑上分成N段 ,依据特征项对文档内容代表能力的不同 ,选择有限的最能代表逻辑段内容的特征项构造文本逻辑段的特征项向量与权值向量 ,并以此为基础计算文档与提问的匹配相似度值 ,从而决定匹配文档的检出与排列顺序。进行了两种模型算法时间复杂度的比较分析 ,讨论了改进模型的可能应用前景和存在问题。  相似文献   

2.
工作区与提问逻辑式结构的理论关系   总被引:1,自引:0,他引:1  
本文用二叉树理论分析了提问逻辑式的逆波兰式的二叉树结构特点,建立了提问逻辑式在计算中所需工作区与逆波兰式二叉树结构间的关系。用此理论揭示了福岛法不能控制工作区的本质原因,以及提问逻辑式在运算中所需工作区的理论范围,为倒排文档检索软件设计设置工作区提供了确切的理论根据。  相似文献   

3.
针对常用信息检索模型存在的两大不足——检索提问与内容表达上的语义缺失与结果返回形式上的单文档局限,提出相应的解决方案,在此基础上进一步提出基于本体的族式返回检索模型,并就该模型中的部分关键问题,如族式返回、查询与文档表示以及语义匹配等进行讨论。  相似文献   

4.
在指出跨语言信息检索技术中大部分实现方法存在分离现象的基础上,介绍将提问式翻译与检索过程统一的思想,并探讨将提问式翻译与检索过程统一的方法。  相似文献   

5.
自动构造布尔检索提问式算法研究   总被引:6,自引:0,他引:6  
本文分析和评价了自动构造布尔检索提问式的两种典型算法,在此基础上提出了一种新的算法──基于样本文献提问构造布尔检索提问式算法。核算法以样本文献提问为基础计算检索词的权重,根据检索词权重值的分布规律来构造布尔检索提问式。此算法的主要目的是简化用户在检索中与情报检索系统的交互过程,从而提高检索效率。笔者利用AUBO检索系统对算法进行了验证。结果表明、,该算法在相同的查全水平上的查准率普遍高于手编提问式的检索结果。  相似文献   

6.
陈能华 《图书馆》1996,(2):29-32
本文依据加权检索的本质特征,运用数学方法,对加权检索的权值分配和阈值的确定进行了规范化探讨,推导出了7个计算权值和阈值的数学公式,解决了加权检索在描述用户提问和检索时没有一个科学的规范化处理过程的问题。本文的研究成果经过大量的计算与上机验证。  相似文献   

7.
电子计算机检索是情报检索的一次深刻革命,是传统的手工检索无法比拟的。然而,应当清醒地看到:机检和手续之间,并不存在万里鸿沟,两者是一脉相承的,有着必然的和本质的联系,只是操作形式、手段和方法不同而已。机检和手检的区别与联系可从许多方面加以阐述,本文仅就二次文献的编制、提问逻辑的处理以及情报提供的方式三个方面,作一简单的论述。  相似文献   

8.
要想充分利用浩如烟海的文献信息资源,必需借助各种各样的检索工具。同时,因特网信息资源的骤增及其异构性、动态性,不断给信息检索带来新的挑战。信息检索已成为现代社会信息化和各种应用的关键。下面就信息的检索方式做一论述。一、布尔检索利用布尔逻辑算符进行检索词或代码的逻辑组配,是现代信息检索系统中最常用的一种方法。常用的布尔逻辑算符有三种,分别是逻辑或“O R”、逻辑与“A N D”、逻辑非“N O T”。用这些逻辑算符将检索词组配构成检索提问式,计算机将根据提问式与系统中的记录进行匹配,当两者相符时则命中,并自动输出…  相似文献   

9.
中文搜索引擎事务类检索的有效转换是在网络检索分类的基础上,以用户事务类检索提问为研究对象,对其形式特点、需求类型、表达方式和组成要素等进行多维度的分析。提出利用词汇控制等手段,通过编制要素词表、提问要素加权的方式来实现检索转换,将检索需求表达不确切或不适合现有搜索引擎检索匹配策略的检索提问转换成更有效的检索提问。  相似文献   

10.
基于句模分析的自然语言处理能识别面向搜索引擎应用的自然语言检索句中的核心检索项.在此基础上,本文通过定义产生式规则和使用归约算法,对常见自然语言提问中蕴含的核心检索项间的逻辑关系进行识别与处理,对自然语言提问中可能蕴含的概念间的逻辑关系进行识别,把概念间可能存在的逻辑关系转化为必要的逻辑运算并确定逻辑优先级.通过在开发的教育资讯搜索引擎与新闻搜索引擎系统上的使用与性能对比分析,该算法能提升自然语言提问的理解能力,提高搜索引擎的智能性.文中亦对其不足进行了说明,并指出在此基础上进一步的研究内容.  相似文献   

11.
Latent Semantic Indexing (LSI) is a popular information retrieval model for concept-based searching. As with many vector space IR models, LSI requires an existing term-document association structure such as a term-by-document matrix. The term-by-document matrix, constructed during document parsing, can only capture weighted vocabulary occurrence patterns in the documents. However, for many knowledge domains there are pre-existing semantic structures that could be used to organize and categorize information. The goals of this study are (i) to demonstrate how such semantic structures can be automatically incorporated into the LSI vector space model, and (ii) to measure the effect of these structures on query matching performance. The new approach, referred to as Knowledge-Enhanced LSI, is applied to documents in the OHSUMED medical abstracts collection using the semantic structures provided by the UMLS Semantic Network and MeSH. Results based on precision-recall data (11-point average precision values) indicate that a MeSH-enhanced search index is capable of delivering noticeable incremental performance gain (as much as 35%) over the original LSI for modest constraints on precision. This performance gain is achieved by replacing the original query with the MeSH heading extracted from the query text via regular expression matches.  相似文献   

12.
Query suggestion, which enables the user to revise a query with a single click, has become one of the most fundamental features of Web search engines. However, it has not been clear what circumstances cause the user to turn to query suggestion. In order to investigate when and how the user uses query suggestion, we analyzed three kinds of data sets obtained from a major commercial Web search engine, comprising approximately 126 million unique queries, 876 million query suggestions and 306 million action patterns of users. Our analysis shows that query suggestions are often used (1) when the original query is a rare query, (2) when the original query is a single-term query, (3) when query suggestions are unambiguous, (4) when query suggestions are generalizations or error corrections of the original query, and (5) after the user has clicked on several URLs in the first search result page. Our results suggest that search engines should provide better assistance especially when rare or single-term queries are input, and that they should dynamically provide query suggestions according to the searcher’s current state.  相似文献   

13.
Enterprise search is important, and the search quality has a direct impact on the productivity of an enterprise. Enterprise data contain both structured and unstructured information. Since these two types of information are complementary and the structured information such as relational databases is designed based on ER (entity-relationship) models, there is a rich body of information about entities in enterprise data. As a result, many information needs of enterprise search center around entities. For example, a user may formulate a query describing a problem that she encounters with an entity, e.g., the web browser, and want to retrieve relevant documents to solve the problem. Intuitively, information related to the entities mentioned in the query, such as related entities and their relations, would be useful to reformulate the query and improve the retrieval performance. However, most existing studies on query expansion are term-centric. In this paper, we propose a novel entity-centric query expansion framework for enterprise search. Specifically, given a query containing entities, we first utilize both unstructured and structured information to find entities that are related to the ones in the query. We then discuss how to adapt existing feedback methods to use the related entities and their relations to improve search quality. Experimental results over two real-world enterprise collections show that the proposed entity-centric query expansion strategies are more effective and robust to improve the search performance than the state-of-the-art pseudo feedback methods for long natural language-like queries with entities. Moreover, results over a TREC ad hoc retrieval collections show that the proposed methods can also work well for short keyword queries in the general search domain.  相似文献   

14.
Query languages for XML such as XPath or XQuery support Boolean retrieval: a query result is a (possibly restructured) subset of XML elements or entire documents that satisfy the search conditions of the query. This search paradigm works for highly schematic XML data collections such as electronic catalogs. However, for searching information in open environments such as the Web or intranets of large corporations, ranked retrieval is more appropriate: a query result is a ranked list of XML elements in descending order of (estimated) relevance. Web search engines, which are based on the ranked retrieval paradigm, do, however, not consider the additional information and rich annotations provided by the structure of XML documents and their element names.This article presents the XXL search engine that supports relevance ranking on XML data. XXL is particularly geared for path queries with wildcards that can span multiple XML collections and contain both exact-match as well as semantic-similarity search conditions. In addition, ontological information and suitable index structures are used to improve the search efficiency and effectiveness. XXL is fully implemented as a suite of Java classes and servlets. Experiments in the context of the INEX benchmark demonstrate the efficiency of the XXL search engine and underline its effectiveness for ranked retrieval.  相似文献   

15.
Query recommendation has long been considered a key feature of search engines, which can improve users’ search experience by providing useful query suggestions for their search tasks. Most existing approaches on query recommendation aim to recommend relevant queries, i.e., alternative queries similar to a user’s initial query. However, the ultimate goal of query recommendation is to assist users to reformulate queries so that they can accomplish their search task successfully and quickly. Only considering relevance in query recommendation is apparently not directly toward this goal. In this paper, we argue that it is more important to directly recommend queries with high utility, i.e., queries that can better satisfy users’ information needs. For this purpose, we attempt to infer query utility from users’ sequential search behaviors recorded in their search sessions. Specifically, we propose a dynamic Bayesian network, referred as Query Utility Model (QUM), to capture query utility by simultaneously modeling users’ reformulation and click behaviors. We then recommend queries with high utility to help users better accomplish their search tasks. We empirically evaluated the performance of our approach on a publicly released query log by comparing with the state-of-the-art methods. The experimental results show that, by recommending high utility queries, our approach is far more effective in helping users find relevant search results and thus satisfying their information needs.  相似文献   

16.
[目的/意义] 揭示移动图书馆用户的查询式构造行为特征,并为移动图书馆的检索功能改进提出建议。[方法/过程] 采用系统日志挖掘法,根据某高校移动图书馆为期一个月的用户日志,通过统计分析方法,利用互信息值、查询式多样性、查询式丰富性、学科分布、持续时间等指标考察移动图书馆用户的查询式关联性、查询重构模式、查询式主题等方面。[结果/结论] 移动图书馆用户的查询式互信息值普遍较低,即查询式在内容上的关联性较弱;重复模式和直线模式是最常见的重构模式,即移动图书馆用户反复搜索同一查询式;移动图书馆用户的搜索兴趣集中在人文社科领域,用户对相同主题查询式的搜索行为具有持续性。建议增加查询推荐功能、自动纠错功能和高级检索功能,以提高移动图书馆检索服务的查全率和查准率。  相似文献   

17.
从Sogou查询日志中选取样本查询且进行人工标注,通过对标注后新闻查询的分析,提出能用于识别新闻意图的新特征,即查询表达式特征、查询随时间分布特征以及点击结果特征。根据这3个特征,利用决策树分类器实现查询中新闻意图的自动识别,结果发现:①新闻类查询的查询目标主要集中在特定主题信息以及娱乐类信息方面,其查询主题大多为娱乐、政治、体育与经济类信息;②相对非新闻查询,新闻查询具有更可能包含实体、随时间分布波动较大、点击结果之间相似度更高的特点;③本方法对查询中新闻意图的识别效果较好,其宏平均准确率、召回率、F值分别为 0.76、0.73、0、74。  相似文献   

18.
The objective of this study was to evaluate the HealthInsite topic query technique, which uses a dynamic database search to assign resources to a topic. It is an alternative to the explicit classification technique, which relies on the classification of each resource using a predefined classification scheme. We performed a recall-precision analysis on all topics within the broad topic area of Child Health. Recall and precision errors were checked to determine which part of the information retrieval process was at fault. We then compared the topic query technique with the explicit classification technique. The results show errors or problems at every stage of the information retrieval process. This has initiated a review of all the tools used in the process, from indexing guidelines to the search engine. While many errors could be corrected, there were still features of the explicit classification technique that could not be achieved by the topic query technique. In conclusion, the topic query technique has the advantage of flexibility, but close co-operation between the different information retrieval specialists is needed to get the best results. The HealthInsite topic navigation structure should be regarded as an organized set of predefined searches rather than a full classified listing.  相似文献   

19.
The goal of this article is to understand the reasons why known-item search queries entered in a discovery system return zero hits. We analyze a sample of 708 known-item queries and classify them into four categories of zero hits with regard to whether the item is held by the library and whether the query is formulated correctly: (1) item in stock, but query incorrect, (2) item not in stock, (3) item in stock, but incomplete or erroneous metadata, (4) query is ambiguous or not understandable. The main reasons for zero hits are caused by acquisition and erroneous search queries. We discuss possible solutions for known-item queries resulting in zero hits from the side of the system and show that 30% of zero hits could easily be avoided by applying automatic spelling correction. We argue that libraries can improve their discovery systems or online catalogs by applying strategies to avoid or cope with zero hits inspired by web search engines and commercial search web sites.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号