首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
TIJAH: Embracing IR Methods in XML Databases   总被引:1,自引:0,他引:1  
This paper discusses our participation in INEX (the Initiative for the Evaluation of XML Retrieval) using the TIJAH XML-IR system. TIJAHs system design follows a standard layered database architecture, carefully separating the conceptual, logical and physical levels. At the conceptual level, we classify the INEX XPath-based query expressions into three different query patterns. For each pattern, we present its mapping into a query execution strategy. The logical layer exploits score region algebra (SRA) as the basis for query processing. We discuss the region operators used to select and manipulate XML document components. The logical algebra expressions are mapped into efficient relational algebra expressions over a physical representation of the XML document collection using the pre-post numbering scheme. The paper concludes with an analysis of experiments performed with the INEX test collection.  相似文献   

2.
Zusammenfassung. Dadurch, dass Literaturnachweise und Publikationen zunehmend in elektronischer und auch vernetzter Form angeboten werden, haben Anzahl und Größe der von wissenschaftlichen Bibliotheken angebotenen Datenbanken erheblich zugenommen. In den verbreiteten Metasuchen über mehrere Datenbanken sind Suchen mit natürlichsprachlichen Suchbegriffen heute der kleinste gemeinsame Nenner. Sie führen aber wegen der bekannten Mängel des booleschen Retrievals häufig zu Treffermengen, die entweder zu speziell oder zu lang und zu unspezifisch sind. Die Technische Fakultät der Universität Bielefeld und die Universitätsbibliothek Bielefeld haben einen auf Fuzzy- Suchlogik basierenden Rechercheassistenten entwickelt, der die Suchanfragen der Benutzer in Teilsuchfragen an die externen Datenbanken zerlegt und die erhaltenen Teilsuchergebnisse in einer nach Relevanz sortierten Liste kumuliert. Es ist möglich, Suchbegriffe zu gewichten und durch Fuzzy- Aggregationsoperatoren zu verknüpfen, die auf der Benutzeroberfläche durch natürlichsprachliche Fuzzy-Quantoren wie möglichst viele, einige u.a. repräsentiert werden. Die Suchparameter werden in der intuitiv bedienbaren einfachen Suche automatisch nach heuristischen Regeln ermittelt, können in einer erweiterten Suche aber auch explizit eingestellt werden. Die Suchmöglichkeiten werden durch Suchen nach ähnlichen Dokumenten und Vorschlagslisten für weitere Suchbegriffe ergänzt. Wir beschreiben die Ausgangssituation, den theoretischen Ansatz, die Benutzeroberfläche und berichten über eine Evalution zur Benutzung und einen Vergleichstest betreffend die Effizienz der Retrievalmethodik.CR Subject Classification: H.3.3, H.3.5Eingegangen am 3. März 2004 / Angenommen am 19. August 2004, Online publiziert am 18. Oktober 2004  相似文献   

3.
For the definition of electronic records, the use of new terms, like literary warrant, is not necessary, and for the European perspective even not understandable. If this expression simply means best practice and professional culture in recordkeeping, we only to know what creators did for centuries and still do today and probably will do also in the future, by referring to the archival science, diplomatics and archival practice for clarifying definitions in the recordkeeping environment. A multi-disciplinary approach is still required for the electronic recordkeeping system as it was in the past for traditional records, but the theory and the terminology should be consistent and based on the deep understanding of essential characteristics of records and essential requirements of good recordkeeping to produce in the first place and maintain reliable and authentic records. Of course, a record is more than recorded information created in the course of business activity: a record is the recorded representation of an act produced in a specific form – the form prescribed by the legal system – by a creator in the course of its activity.  相似文献   

4.
In this paper the problem of indexing heterogeneous structured documents and of retrieving semi-structured documents is considered. We propose a flexible paradigm for both indexing such documents and formulating user queries specifying soft constraints on both documents structure and content. At the indexing level we propose a model that achieves flexibility by constructing personalised document representations based on users views of the documents. This is obtained by allowing users to specify their preferences on the documents sections that they estimate to bear the most interesting information, as well as to linguistically quantify the number of sections which determine the global potential interest of the documents. At the query language level, a flexible query language for expressing soft selection conditions on both the documents structure and content is proposed.  相似文献   

5.
Zusammenfassung. Das System fur die interaktive, automatische Stundenplanung ist im Rahmen der Forschungsarbeiten des Bereichs Planungstechnik und Deklarative Programmierung in Fraunhofer FIRST zur Erweiterung der Constraint-basierten Programmierung entwickelt worden. Mit dem System wird die Stundenplanung der Medizinischen Fakultat Charité seit dem Sommersemester 1998 vorgenommen. Seitdem wurde das System kontinuierlich weiterentwickelt. Der erfolgreiche Einsatz des Systems zeigte, dass die gewahlten Methoden und Verfahren sehr geeignet fur die Behandlung derartiger Probleme sind. Die Vorteile einer kombinierten interaktiven und automatischen Stundenplanerzeugung konnten eindeutig nachgewiesen werden.CR Subject Classification: I.2.8, I.2.3, J.1, K.3.2, D.3.3, D.1.6Eingegangen am 15. März 2003 / Angenommen am 9. März 2004, Online publiziert: 1. Juli 2004  相似文献   

6.
The Museum is a perspicuous site for analysing the complex interplay between social, organisational, cultural and political factors which have relevance to the design and use of virtual technologies. Specifically, the introduction of virtual technologies in museums runs up against the issue of the situated character of information use. Across a number of disciplines (anthropology, sociology, psychology, cognitive science) there is growing recognition of the situatedness of knowledge and its importance for the design and use of technology. This awareness is fostered by the fact that technological developments are often associated with disappointing gains for users. The effective use of technology relies on the degree to which it can be embedded in or congruent with the local practices of museum users. Drawing upon field research in two museums of science and technology, both of which are in the process of introducing virtual technologies and exploring the possibilities of on-line access, findings are presented which suggest that the success of such developments will depend on the extent to which they are informed by detailed understanding of practice-practices that are essentially socially constituted in the activities of museum visitors and the daily work of museum professionals.  相似文献   

7.
Detection As Multi-Topic Tracking   总被引:1,自引:0,他引:1  
The topic tracking task from TDT is a variant of information filtering tasks that focuses on event-based topics in streams of broadcast news. In this study, we compare tracking to another TDT task, detection, which has the goal of partitioning all arriving news into topics, regardless of whether the topics are of interest to anyone, and even when a new topic appears that had not been previous anticipated. There are clear relationships between the two tasks (under some assumptions, a perfect tracking system could solve the detection problem), but they are evaluated quite differently. We describe the two tasks and discuss their similarities. We show how viewing detection as a form of multi-topic parallel tracking can illuminate the performance tradeoffs of detection over tracking.  相似文献   

8.
New Mexico State University's Computing Research Lab has participated in research in all three phases of the US Government's Tipster program. Our work on information retrieval has focused on research and development of multilingual and cross-language approaches to automatic retrieval. The work on automatic systems has been supplemented by additional research into the role of the IR system user in interactive retrieval scenarios: monolingual, multilingual and cross-language. The combined efforts suggest that universal text retrieval, in which a user can find, access and use documents in the face of language differences and information overload, may be possible.  相似文献   

9.
Information Retrieval systems typically sort the result with respect to document retrieval status values (RSV). According to the Probability Ranking Principle, this ranking ensures optimum retrieval quality if the RSVs are monotonously increasing with the probabilities of relevance (as e.g. for probabilistic IR models). However, advanced applications like filtering or distributed retrieval require estimates of the actual probability of relevance. The relationship between the RSV of a document and its probability of relevance can be described by a normalisation function which maps the retrieval status value onto the probability of relevance (mapping functions). In this paper, we explore the use of linear and logistic mapping functions for different retrieval methods. In a series of upper-bound experiments, we compare the approximation quality of the different mapping functions. We also investigate the effect on the resulting retrieval quality in distributed retrieval (only merging, without resource selection). These experiments show that good estimates of the actual probability of relevance can be achieved, and that the logistic model outperforms the linear one. Retrieval quality for distributed retrieval is only slightly improved by using the logistic function.  相似文献   

10.
Generalized Hamming Distance   总被引:4,自引:0,他引:4  
Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a Generalized Hamming distance that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the Generalized Hamming distance, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measure's effectiveness in relating similar bitstrings.  相似文献   

11.
Exploiting the Similarity of Non-Matching Terms at Retrieval Time   总被引:2,自引:0,他引:2  
In classic Information Retrieval systems a relevant document will not be retrieved in response to a query if the document and query representations do not share at least one term. This problem, known as term mismatch, has been recognised for a long time by the Information Retrieval community and a number of possible solutions have been proposed. Here I present a preliminary investigation into a new class of retrieval models that attempt to solve the term mismatch problem by exploiting complete or partial knowledge of term similarity in the term space. The use of term similarity enables to enhance classic retrieval models by taking into account non-matching terms. The theoretical advantages and drawbacks of these models are presented and compared with other models tackling the same problem. A preliminary experimental investigation into the performance gain achieved by exploiting term similarity with the proposed models is presented and discussed.  相似文献   

12.
The Archival Bond   总被引:1,自引:0,他引:1  
This paper presents the concept of archival bond as formulated by archival science and used in a research project carried out at the University of British Columbia, entitled The Preservation of Electronic Records. Being one of the essential components of the record, the concept of archival bond is discussed in the context of the traditional diplomatic and archival definitions of records, and its function in demonstrating the reliability and authenticity of records is shown. The most serious challenge with which we are confronted is to make explicit and preserve intact over the long term the archival bond between electronic and non electronic records belonging in the same aggregations.  相似文献   

13.
Variability is a central concept in software product family development. Variability empowers constructive reuse and facilitates the derivation of different, customer specific products from the product family. If many customer specific requirements can be realised by exploiting the product family variability, the reuse achieved is obviously high. If not, the reuse is low. It is thus important that the variability of the product family is adequately considered when eliciting requirements from the customer. In this paper we sketch the challenges for requirements engineering for product family applications. More precisely we elaborate on the need to communicate the variability of the product family to the customer. We differentiate between variability aspects which are essential for the customer and aspects which are more related to the technical realisation and need thus not be communicated to the customer. Motivated by the successful usage of use cases in single product development we propose use cases as communication medium for the product family variability. We discuss and illustrate which customer relevant variability aspects can be represented with use cases, and for which aspects use cases are not suitable. Moreover we propose extensions to use case diagrams to support an intuitive representation of customer relevant variability aspects.Received: 14 October 2002, Accepted: 8 January 2003, This work was partially funded by the CAFÉ project From Concept to Application in System Family Engineering; Eureka ! 2023 Programme, ITEA Project ip00004 (BMBF, Förderkennzeichen 01 IS 002 C) and the state Nord-Rhein-Westfalia. This paper is a significant extension of the paper Modellierung der Variabilität einer Produktfamilie, [15].  相似文献   

14.
Kleinbergs HITS algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.Supported in part by a grant from the Italian National Research Council (CNR) research project Technologies and Services for Enhanced Content Delivery.  相似文献   

15.
This article examines theclaim that, through its overt symbolicmessaging, the Gatineau Preservation Centre,opened by the National Archives of Canada in1997, embodies a perfect transparency betweenfunction and form, with the shape of the placebeing derived seamlessly from the needs of thearchival work done there, and the proof beingin the exposure of all the elements to view. It reveals the undercurrents of contendingoppositions to this claim, both in thesubversive, Mannerist, or impure architectural eccentricities designed into thestructure, and in the embodiment of archivalnarratives whose symbolism is challenged byunacknowledged resistances. While the buildingis clearly inspired by Modernist andEnlightenment orientations, such as theambition to preserve unchanged a universal,transcendent historical authenticity, thesediverse resistances buried in it aremanifested, for example, in the contest of maleversus female structural elements, and inthe authority of the monumental and exposed setagainst the seduction of the varied and secret. Most importantly, the absorption of the bodyboth metaphorically and physically into themany disciplines of the place unconsciouslycalls into question the building's self-imageas the epitome of a liberal-humanist andobjective-scientific activity; it reflectsinstead the destabilizing plays and displays ofpower which are increasingly seen to form theindeterminate field of the archival pursuit.  相似文献   

16.
Dadurch, dass Literaturnachweise und Publikationen zunehmend in elektronischer und auch vernetzter Form angeboten werden, haben Anzahl und Größe der von wissenschaftlichen Bibliotheken angebotenen Datenbanken erheblich zugenommen. In den verbreiteten Metasuchen über mehrere Datenbanken sind Suchen mit natürlichsprachlichen Suchbegriffen heute der kleinste gemeinsame Nenner. Sie führen aber wegen der bekannten Mängel des booleschen Retrievals häufig zu Treffermengen, die entweder zu speziell oder zu lang und zu unspezifisch sind. Die Technische Fakultät der Universität Bielefeld und die Universitätsbibliothek Bielefeld haben einen auf Fuzzy- Suchlogik basierenden Rechercheassistenten entwickelt, der die Suchanfragen der Benutzer in Teilsuchfragen an die externen Datenbanken zerlegt und die erhaltenen Teilsuchergebnisse in einer nach Relevanz sortierten Liste kumuliert. Es ist möglich, Suchbegriffe zu gewichten und durch Fuzzy- Aggregationsoperatoren zu verknüpfen, die auf der Benutzeroberfläche durch natürlichsprachliche Fuzzy-Quantoren wie möglichst viele, einige u.a. repräsentiert werden. Die Suchparameter werden in der intuitiv bedienbaren einfachen Suche automatisch nach heuristischen Regeln ermittelt, können in einer erweiterten Suche aber auch explizit eingestellt werden. Die Suchmöglichkeiten werden durch Suchen nach ähnlichen Dokumenten und Vorschlagslisten für weitere Suchbegriffe ergänzt. Wir beschreiben die Ausgangssituation, den theoretischen Ansatz, die Benutzeroberfläche und berichten über eine Evalution zur Benutzung und einen Vergleichstest betreffend die Effizienz der Retrievalmethodik.  相似文献   

17.
Modeling users in information filtering systems is a difficult challenge due to dimensions such as nature, scope, and variability of interests. Numerous machine-learning approaches have been proposed for user modeling in filtering systems. The focus has been primarily on techniques for user model capture and representation, with relatively simple assumptions made about the type of users' interests. Although many studies claim to deal with adaptive techniques and thus they pay heed to the fact that different types of interests must be modeled or even changes in interests have to be captured, few studies have actually focused on the dynamic nature and the variability of user-interests and their impact on the modeling process. A simulation based information filtering environment called SIMSFITER was developed to overcome some of the barriers associated with conducting studies on user-oriented factors that can impact interests. SIMSIFTER implemented a user modeling approach known as reinforcement learning that has proven to be effective in previous filtering studies involving humans. This paper reports on several studies conducted using SIMSIFTER that examined the impact of key dimensions such as type of interests, rate of change of interests and level of user-involvement on modeling accuracy and ultimately on filtering effectiveness.  相似文献   

18.
With a central focus on thecultural contexts of Pacific island societies,this essay examines the entanglement ofcolonial power relations in local recordkeepingpractices. These cultural contexts include theon-going exchange between oral and literatecultures, the aftermath of colonialdisempowerment and reassertion of indigenousrights and identities, the difficulty ofmaintaining full archival systems in isolated,resource-poor micro-states, and the drivinginfluence of development theory. The essayopens with a discussion of concepts ofexploration and evangelism in cross-culturalanalysis as metaphors for archival endeavour. It then explores the cultural exchanges betweenoral memory and written records, orality, andliteracy, as means of keeping evidence andremembering. After discussing the relation ofrecords to processes of political and economicdisempowerment, and the reclaiming of rightsand identities, it returns to the patterns ofarchival development in the Pacific region toconsider how archives can better integrate intotheir cultural and political contexts, with theaim of becoming more valued parts of theircommunities.  相似文献   

19.
Images and signals may be represented by forms invariant to time shifts, spatial shifts, frequency shifts, and scale changes. Advances in time-frequency analysis and scale transform techniques have made this possible. However, factors such as noise contamination and style differences complicate this. An example is found in text, where letters and words may vary in size and position. Examples of complicating variations include the font used, corruption during facsimile (fax) transmission, and printer characteristics. The solution advanced in this paper is to cast the desired invariants into separate subspaces for each extraneous factor or group of factors. The first goal is to have minimal overlap between these subspaces and the second goal is to be able to identify each subspace accurately. Concepts borrowed from high-resolution spectral analysis, but adapted uniquely to this problem have been found to be useful in this context. Once the pertinent subspace is identified, the recognition of a particular invariant form within this subspace is relatively simple using well-known singular value decomposition (SVD) techniques. The basic elements of the approach can be applied to a variety of pattern recognition problems. The specific application covered in this paper is word spotting in bitmapped fax documents.  相似文献   

20.
The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using number-to-view graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号