首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
The principle of polyrepresentation offers a theoretical framework for handling multiple contexts in information retrieval (IR). This paper presents an empirical laboratory study of polyrepresentation in restricted mode of the information space with focus on inter and intra-document features. The Cystic Fibrosis test collection indexed in the best match system InQuery constitutes the experimental setting. Overlaps between five functionally and/or cognitively different document representations are identified. Supporting the principle of polyrepresentation, results show that in general overlaps generated by three or four representations of different nature have higher precision than those generated from two representations or the single fields. This result pertains to both structured and unstructured query mode in best match retrieval, however, with the latter query mode demonstrating higher performance. The retrieval overlaps containing search keys from the bibliographic references provide the best retrieval performance and minor MeSH terms the worst. It is concluded that a highly structured query language is necessary when implementing the principle of polyrepresentation in a best match IR system because the principle is inherently Boolean. Finally a re-ranking test shows promising results when search results are re-ranked according to precision obtained in the overlaps whilst re-ranking by citations seems less useful when integrated into polyrepresentative applications.  相似文献   

2.
Structured document retrieval makes use of document components as the basis of the retrieval process, rather than complete documents. The inherent relationships between these components make it vital to support users’ natural browsing behaviour in order to offer effective and efficient access to structured documents. This paper examines the concept of best entry points, which are document components from which the user can browse to obtain optimal access to relevant document components. It investigates at the types of best entry points in structured document retrieval, and their usage and effectiveness in real information search tasks.  相似文献   

3.
4.
Structured document retrieval makes use of document components as the basis of the retrieval process, rather than complete documents. The inherent relationships between these components make it vital to support users’ natural browsing behaviour in order to offer effective and efficient access to structured documents. This paper examines the concept of best entry points, which are document components from which the user can browse to obtain optimal access to relevant document components. In particular this paper investigates the basic characteristics of best entry points.  相似文献   

5.
如何在海量的非结构文档内容中准确、快捷找到自己所需要的信息,是信息检索技术的研究重点。全文检索是现代信息检索技术一个非常重要的分支,是解决非结构化数据检索需求的重要技术手段。以已发布的各类通信业务管理规范的全文检索需求为切入点,设计并实现了适用于国家级气象信息化业务管理的非结构化文档全文检索系统。该系统基于Java技术,并采用Lucene技术框架,对业务规范信息进行了分析和重新数据组织,确保良好的检索时效与准确率。系统应用后能快速应对业务变化,在已有的大量的规定、规范、标准和公文函件中迅速、准确、全面地查找有关资料信息,帮助用户准确把握气象信息化发展脉络。  相似文献   

6.
This paper investigates the stability robustness of linear output feedback systems with both time-varying structured (elemental) and unstructured (norm-bounded) parameter uncertainties as well as delayed perturbations by directly considering the mixed quadratically coupled uncertainties in the problem formulation. Based on the Lyapunov approach and some essential properties of matrix measures, two new sufficient conditions are proposed for ensuring that the linear output feedback systems with delayed perturbations as well as both time-varying structured and unstructured parameter uncertainties are asymptotically stable. The corresponding stable region, that is obtained by using the proposed sufficient conditions, in the parameter space is not necessarily symmetric with respect to the origin of the parameter space. Two numerical examples are given to illustrate the application of the presented sufficient conditions, and for the case of only considering both the delayed perturbations and time-varying structured parameter uncertainties, it can be shown that the results proposed in this paper are better than the existing one reported in the literature.  相似文献   

7.
The paper reports on experiments carried out in transitive translation, a branch of cross-language information retrieval (CLIR). By transitive translation we mean translation of search queries into the language of the document collection through an intermediate (or pivot) language. In our experiments, queries constructed from CLEF 2000 and 2001 Swedish, Finnish and German topics were translated into English through Finnish and Swedish by an automated translation process using morphological analyzers, stopword lists, electronic dictionaries, n-gramming of untranslatable words, and structured and unstructured queries. The results of the transitive runs were compared to the results of the bilingual runs, i.e. runs translating the same queries directly into English. The transitive runs using structured target queries performed well. The differences ranged from −6.6% to +2.9% units (or −25.5% to +7.8%) between the approaches. Thus transitive translation challenges direct translation and considerably simplifies global CLIR efforts.  相似文献   

8.
As an information medium, video offers many possible retrieval and browsing modalities, far more than text, image or audio. Some of these, like searching the text of the spoken dialogue, are well developed, others like keyframe browsing tools are in their infancy, and others not yet technically achievable. For those modalities for browsing and retrieval which we cannot yet achieve we can only speculate as to how useful they will actually be, but we do not know for sure. In our work we have created a system to support multiple modalities for video browsing and retrieval including text search through the spoken dialogue, image matching against shot keyframes and object matching against segmented video objects. For the last of these, automatic segmentation and tracking of video objects is a computationally demanding problem which is not yet solved for generic natural video material, and when it is then it is expected to open up possibilities for user interaction with objects in video, including searching and browsing. In this paper we achieve object segmentation by working in a closed domain of animated cartoons. We describe an interactive user experiment on a medium-sized corpus of video where we were able to measure users’ use of video objects versus other modes of retrieval during multiple-iteration searching. Results of this experiment show that although object searching is used far less than text searching in the first iteration of a user’s search it is a popular and useful search type once an initial set of relevant shots have been found.  相似文献   

9.
The data fusion technique has been investigated by many researchers and has been used in implementing several information retrieval systems. However, the results from data fusion vary in different situations. To find out under which condition data fusion may lead to performance improvement is an important issue. In this paper, we present an analysis of the behaviour of several well-known methods such as CombSum and CombMNZ for fusion of multiple information retrieval results. Based on this analysis, we predict the performance of the data fusion methods. Experiments are conducted with three groups of results submitted to TREC 6, TREC 2001, and TREC 2004. The experiments show that the prediction of the performance of data fusion is quite accurate, and it can be used in situations very different from the training examples. Compared with previous work, our result is more accurate and in a better position for applications since various number of component systems can be supported while only two was used previously.  相似文献   

10.
Decisions in thesaurus construction and use   总被引:1,自引:0,他引:1  
A thesaurus and an ontology provide a set of structured terms, phrases, and metadata, often in a hierarchical arrangement, that may be used to index, search, and mine documents. We describe the decisions that should be made when including a term, deciding whether a term should be subdivided into its subclasses, or determining which of more than one set of possible subclasses should be used. Based on retrospective measurements or estimates of future performance when using thesaurus terms in document ordering, decisions are made so as to maximize performance. These decisions may be used in the automatic construction of a thesaurus. The evaluation of an existing thesaurus is described, consistent with the decision criteria developed here. These kinds of user-focused decision-theoretic techniques may be applied to other hierarchical applications, such as faceted classification systems used in information architecture or the use of hierarchical terms in “breadcrumb navigation”.  相似文献   

11.
How to merge and organise query results retrieved from different resources is one of the key issues in distributed information retrieval. Some previous research and experiments suggest that cluster-based document browsing is more effective than a single merged list. Cluster-based retrieval results presentation is based on the cluster hypothesis, which states that documents that cluster together have a similar relevance to a given query. However, while this hypothesis has been demonstrated to hold in classical information retrieval environments, it has never been fully tested in heterogeneous distributed information retrieval environments. Heterogeneous document representations, the presence of document duplicates, and disparate qualities of retrieval results, are major features of an heterogeneous distributed information retrieval environment that might disrupt the effectiveness of the cluster hypothesis. In this paper we report on an experimental investigation into the validity and effectiveness of the cluster hypothesis in highly heterogeneous distributed information retrieval environments. The results show that although clustering is affected by different retrieval results representations and quality, the cluster hypothesis still holds and that generating hierarchical clusters in highly heterogeneous distributed information retrieval environments is still a very effective way of presenting retrieval results to users.  相似文献   

12.
用户当前正在浏览的网页内容有助于说明用户的即时信息需求.在现有相关研究的基础上提出了一种基于上下文的Web即时信息检索方法,该方法允许用户从正在浏览的网页中选择一段文本作为原始检索条件,由检索系统从其上下文中提取一级扩展词和二级扩展词来形成新的检索条件进行检索,并将检索结果按相似度从大到小的顺序呈现给用户.  相似文献   

13.
Multitasking is the human ability to handle the demands of multiple tasks. Multitasking behavior involves the ordering of multiple tasks and switching between tasks. People often multitask when using information retrieval (IR) technologies as they seek information on more than one information problem over single or multiple search episodes. However, limited studies have examined how people order their information problems, especially during their Web search engine interaction. The aim of our exploratory study was to investigate assigned information problem ordering by forty (40) study participants engaged in Web search. Findings suggest that assigned information problem ordering was influenced by the following factors, including personal interest, problem knowledge, perceived level of information available on the Web, ease of finding information, level of importance and seeking information on information problems in order from general to specific. Personal interest and problem knowledge were the major factors during assigned information problem ordering. Implications of the findings and further research are discussed. The relationship between information problem ordering and gratification theory is an important area for further exploration.  相似文献   

14.
We will explore various ways to apply query structuring in cross-language information retrieval. In the first test, English queries were translated into Finnish using an electronic dictionary, and were run in a Finnish newspaper database of 55,000 articles. Queries were structured by combining the Finnish translation equivalents of the same English query key using the syn-operator of the InQuery retrieval system. Structured queries performed markedly better than unstructured queries. Second, the effects of compound-based structuring using a proximity operator for the translation equivalents of query language compound components were tested. The method was not useful in syn-based queries but resulted in decrease in retrieval effectiveness. Proper names are often non-identical spelling variants in different languages. This allows n-gram based translation of names not included in a dictionary. In the third test, a query structuring method where the Boolean and-operator was used to assign more weight to keys translated through n-gram matching gave good results.  相似文献   

15.
16.
Through the recent NTCIR workshops, patent retrieval casts many challenging issues to information retrieval community. Unlike newspaper articles, patent documents are very long and well structured. These characteristics raise the necessity to reassess existing retrieval techniques that have been mainly developed for structure-less and short documents such as newspapers. This study investigates cluster-based retrieval in the context of invalidity search task of patent retrieval. Cluster-based retrieval assumes that clusters would provide additional evidence to match user’s information need. Thus far, cluster-based retrieval approaches have relied on automatically-created clusters. Fortunately, all patents have manually-assigned cluster information, international patent classification codes. International patent classification is a standard taxonomy for classifying patents, and has currently about 69,000 nodes which are organized into a five-level hierarchical system. Thus, patent documents could provide the best test bed to develop and evaluate cluster-based retrieval techniques. Experiments using the NTCIR-4 patent collection showed that the cluster-based language model could be helpful to improving the cluster-less baseline language model.  相似文献   

17.
18.
Among the problems associated with modern information retrieval systems is the lack of any systematic approach to the design of query language interfaces. In this paper we attempt to show how a relationally organised data base is well suited to bibliographic data management, and how, given such a relational organisation it is possible to construct an interface which separates the query language from the physical representation of the data base. It is also shown how such a query language organisation may be usefully interfaced to existing retrieval systems. Finally a query language for retrieval applications is proposed.  相似文献   

19.
Data availability and access to various platforms, is changing the nature of Information Systems (IS) studies. Such studies often use large datasets, which may incorporate structured and unstructured data, from various platforms. The questions that such papers address, in turn, may attempt to use methods from computational science like sentiment mining, text mining, network science and image analytics to derive insights. However, there is often a weak theoretical contribution in many of these studies. We point out the need for such studies to contribute back to the IS discipline, whereby findings can explain more about the phenomenon surrounding the interaction of people with technology artefacts and the ecosystem within which these contextual usage is situated. Our opinion paper attempts to address this gap and provide insights on the methodological adaptations required in “big data studies” to be converted into “IS research” and contribute to theory building in information systems.  相似文献   

20.
Focusing on the context of XML retrieval, in this paper we propose a general methodology for managing structured queries (involving both content and structure) within any given structured probabilistic information retrieval system which is able to compute posterior probabilities of relevance for structural components given a non-structured query (involving only query terms but not structural restrictions). We have tested our proposal using two specific information retrieval systems (Garnata and PF/Tijah), and the structured document collections from the last six editions of the INitiative for the Evaluation of XML Retrieval (INEX).  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号