首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 46 毫秒
1.
With a central focus on thecultural contexts of Pacific island societies,this essay examines the entanglement ofcolonial power relations in local recordkeepingpractices. These cultural contexts include theon-going exchange between oral and literatecultures, the aftermath of colonialdisempowerment and reassertion of indigenousrights and identities, the difficulty ofmaintaining full archival systems in isolated,resource-poor micro-states, and the drivinginfluence of development theory. The essayopens with a discussion of concepts ofexploration and evangelism in cross-culturalanalysis as metaphors for archival endeavour. It then explores the cultural exchanges betweenoral memory and written records, orality, andliteracy, as means of keeping evidence andremembering. After discussing the relation ofrecords to processes of political and economicdisempowerment, and the reclaiming of rightsand identities, it returns to the patterns ofarchival development in the Pacific region toconsider how archives can better integrate intotheir cultural and political contexts, with theaim of becoming more valued parts of theircommunities.  相似文献   

2.
Generalized Hamming Distance   总被引:4,自引:0,他引:4  
Many problems in information retrieval and related fields depend on a reliable measure of the distance or similarity between objects that, most frequently, are represented as vectors. This paper considers vectors of bits. Such data structures implement entities as diverse as bitmaps that indicate the occurrences of terms and bitstrings indicating the presence of edges in images. For such applications, a popular distance measure is the Hamming distance. The value of the Hamming distance for information retrieval applications is limited by the fact that it counts only exact matches, whereas in information retrieval, corresponding bits that are close by can still be considered to be almost identical. We define a Generalized Hamming distance that extends the Hamming concept to give partial credit for near misses, and suggest a dynamic programming algorithm that permits it to be computed efficiently. We envision many uses for such a measure. In this paper we define and prove some basic properties of the Generalized Hamming distance, and illustrate its use in the area of object recognition. We evaluate our implementation in a series of experiments, using autonomous robots to test the measure's effectiveness in relating similar bitstrings.  相似文献   

3.
TIJAH: Embracing IR Methods in XML Databases   总被引:1,自引:0,他引:1  
This paper discusses our participation in INEX (the Initiative for the Evaluation of XML Retrieval) using the TIJAH XML-IR system. TIJAHs system design follows a standard layered database architecture, carefully separating the conceptual, logical and physical levels. At the conceptual level, we classify the INEX XPath-based query expressions into three different query patterns. For each pattern, we present its mapping into a query execution strategy. The logical layer exploits score region algebra (SRA) as the basis for query processing. We discuss the region operators used to select and manipulate XML document components. The logical algebra expressions are mapped into efficient relational algebra expressions over a physical representation of the XML document collection using the pre-post numbering scheme. The paper concludes with an analysis of experiments performed with the INEX test collection.  相似文献   

4.
For the definition of electronic records, the use of new terms, like literary warrant, is not necessary, and for the European perspective even not understandable. If this expression simply means best practice and professional culture in recordkeeping, we only to know what creators did for centuries and still do today and probably will do also in the future, by referring to the archival science, diplomatics and archival practice for clarifying definitions in the recordkeeping environment. A multi-disciplinary approach is still required for the electronic recordkeeping system as it was in the past for traditional records, but the theory and the terminology should be consistent and based on the deep understanding of essential characteristics of records and essential requirements of good recordkeeping to produce in the first place and maintain reliable and authentic records. Of course, a record is more than recorded information created in the course of business activity: a record is the recorded representation of an act produced in a specific form – the form prescribed by the legal system – by a creator in the course of its activity.  相似文献   

5.
The Archival Bond   总被引:1,自引:0,他引:1  
This paper presents the concept of archival bond as formulated by archival science and used in a research project carried out at the University of British Columbia, entitled The Preservation of Electronic Records. Being one of the essential components of the record, the concept of archival bond is discussed in the context of the traditional diplomatic and archival definitions of records, and its function in demonstrating the reliability and authenticity of records is shown. The most serious challenge with which we are confronted is to make explicit and preserve intact over the long term the archival bond between electronic and non electronic records belonging in the same aggregations.  相似文献   

6.
Zusammenfassung. Das System fur die interaktive, automatische Stundenplanung ist im Rahmen der Forschungsarbeiten des Bereichs Planungstechnik und Deklarative Programmierung in Fraunhofer FIRST zur Erweiterung der Constraint-basierten Programmierung entwickelt worden. Mit dem System wird die Stundenplanung der Medizinischen Fakultat Charité seit dem Sommersemester 1998 vorgenommen. Seitdem wurde das System kontinuierlich weiterentwickelt. Der erfolgreiche Einsatz des Systems zeigte, dass die gewahlten Methoden und Verfahren sehr geeignet fur die Behandlung derartiger Probleme sind. Die Vorteile einer kombinierten interaktiven und automatischen Stundenplanerzeugung konnten eindeutig nachgewiesen werden.CR Subject Classification: I.2.8, I.2.3, J.1, K.3.2, D.3.3, D.1.6Eingegangen am 15. März 2003 / Angenommen am 9. März 2004, Online publiziert: 1. Juli 2004  相似文献   

7.
Information Retrieval systems typically sort the result with respect to document retrieval status values (RSV). According to the Probability Ranking Principle, this ranking ensures optimum retrieval quality if the RSVs are monotonously increasing with the probabilities of relevance (as e.g. for probabilistic IR models). However, advanced applications like filtering or distributed retrieval require estimates of the actual probability of relevance. The relationship between the RSV of a document and its probability of relevance can be described by a normalisation function which maps the retrieval status value onto the probability of relevance (mapping functions). In this paper, we explore the use of linear and logistic mapping functions for different retrieval methods. In a series of upper-bound experiments, we compare the approximation quality of the different mapping functions. We also investigate the effect on the resulting retrieval quality in distributed retrieval (only merging, without resource selection). These experiments show that good estimates of the actual probability of relevance can be achieved, and that the logistic model outperforms the linear one. Retrieval quality for distributed retrieval is only slightly improved by using the logistic function.  相似文献   

8.
Exploiting the Similarity of Non-Matching Terms at Retrieval Time   总被引:2,自引:0,他引:2  
In classic Information Retrieval systems a relevant document will not be retrieved in response to a query if the document and query representations do not share at least one term. This problem, known as term mismatch, has been recognised for a long time by the Information Retrieval community and a number of possible solutions have been proposed. Here I present a preliminary investigation into a new class of retrieval models that attempt to solve the term mismatch problem by exploiting complete or partial knowledge of term similarity in the term space. The use of term similarity enables to enhance classic retrieval models by taking into account non-matching terms. The theoretical advantages and drawbacks of these models are presented and compared with other models tackling the same problem. A preliminary experimental investigation into the performance gain achieved by exploiting term similarity with the proposed models is presented and discussed.  相似文献   

9.
Zusammenfassung. Dadurch, dass Literaturnachweise und Publikationen zunehmend in elektronischer und auch vernetzter Form angeboten werden, haben Anzahl und Größe der von wissenschaftlichen Bibliotheken angebotenen Datenbanken erheblich zugenommen. In den verbreiteten Metasuchen über mehrere Datenbanken sind Suchen mit natürlichsprachlichen Suchbegriffen heute der kleinste gemeinsame Nenner. Sie führen aber wegen der bekannten Mängel des booleschen Retrievals häufig zu Treffermengen, die entweder zu speziell oder zu lang und zu unspezifisch sind. Die Technische Fakultät der Universität Bielefeld und die Universitätsbibliothek Bielefeld haben einen auf Fuzzy- Suchlogik basierenden Rechercheassistenten entwickelt, der die Suchanfragen der Benutzer in Teilsuchfragen an die externen Datenbanken zerlegt und die erhaltenen Teilsuchergebnisse in einer nach Relevanz sortierten Liste kumuliert. Es ist möglich, Suchbegriffe zu gewichten und durch Fuzzy- Aggregationsoperatoren zu verknüpfen, die auf der Benutzeroberfläche durch natürlichsprachliche Fuzzy-Quantoren wie möglichst viele, einige u.a. repräsentiert werden. Die Suchparameter werden in der intuitiv bedienbaren einfachen Suche automatisch nach heuristischen Regeln ermittelt, können in einer erweiterten Suche aber auch explizit eingestellt werden. Die Suchmöglichkeiten werden durch Suchen nach ähnlichen Dokumenten und Vorschlagslisten für weitere Suchbegriffe ergänzt. Wir beschreiben die Ausgangssituation, den theoretischen Ansatz, die Benutzeroberfläche und berichten über eine Evalution zur Benutzung und einen Vergleichstest betreffend die Effizienz der Retrievalmethodik.CR Subject Classification: H.3.3, H.3.5Eingegangen am 3. März 2004 / Angenommen am 19. August 2004, Online publiziert am 18. Oktober 2004  相似文献   

10.
Text Categorization (TC) is the automated assignment of text documents to predefined categories based on document contents. TC has been an application for many learning approaches, which prove effective. Nevertheless, TC provides many challenges to machine learning. In this paper, we suggest, for text categorization, the integration of external WordNet lexical information to supplement training data for a semi-supervised clustering algorithm which can learn from both training and test documents to classify new unseen documents. This algorithm is the Semi-Supervised Fuzzy c-Means (ssFCM). Our experiments use Reuters 21578 database and consist of binary classifications for categories selected from the 115 TOPICS classes of the Reuters collection. Using the Vector Space Model, each document is represented by its original feature vector augmented with external feature vector generated using WordNet. We verify experimentally that the integration of WordNet helps ssFCM improve its performance, effectively addresses the classification of documents into categories with few training documents and does not interfere with the use of training data.  相似文献   

11.
This paper presents an experimental evaluation of several text-based methods for detecting duplication in scanned document databases using uncorrected OCR output. This task is made challenging both by the wide range of degradations printed documents can suffer, and by conflicting interpretations of what it means to be a duplicate. We report results for four sets of experiments exploring various aspects of the problem space. While the techniques studied are generally robust in the face of most types of OCR errors, there are nonetheless important differences which we identify and discuss in detail.  相似文献   

12.
In this paper the problem of indexing heterogeneous structured documents and of retrieving semi-structured documents is considered. We propose a flexible paradigm for both indexing such documents and formulating user queries specifying soft constraints on both documents structure and content. At the indexing level we propose a model that achieves flexibility by constructing personalised document representations based on users views of the documents. This is obtained by allowing users to specify their preferences on the documents sections that they estimate to bear the most interesting information, as well as to linguistically quantify the number of sections which determine the global potential interest of the documents. At the query language level, a flexible query language for expressing soft selection conditions on both the documents structure and content is proposed.  相似文献   

13.
Modeling users in information filtering systems is a difficult challenge due to dimensions such as nature, scope, and variability of interests. Numerous machine-learning approaches have been proposed for user modeling in filtering systems. The focus has been primarily on techniques for user model capture and representation, with relatively simple assumptions made about the type of users' interests. Although many studies claim to deal with adaptive techniques and thus they pay heed to the fact that different types of interests must be modeled or even changes in interests have to be captured, few studies have actually focused on the dynamic nature and the variability of user-interests and their impact on the modeling process. A simulation based information filtering environment called SIMSFITER was developed to overcome some of the barriers associated with conducting studies on user-oriented factors that can impact interests. SIMSIFTER implemented a user modeling approach known as reinforcement learning that has proven to be effective in previous filtering studies involving humans. This paper reports on several studies conducted using SIMSIFTER that examined the impact of key dimensions such as type of interests, rate of change of interests and level of user-involvement on modeling accuracy and ultimately on filtering effectiveness.  相似文献   

14.
New Mexico State University's Computing Research Lab has participated in research in all three phases of the US Government's Tipster program. Our work on information retrieval has focused on research and development of multilingual and cross-language approaches to automatic retrieval. The work on automatic systems has been supplemented by additional research into the role of the IR system user in interactive retrieval scenarios: monolingual, multilingual and cross-language. The combined efforts suggest that universal text retrieval, in which a user can find, access and use documents in the face of language differences and information overload, may be possible.  相似文献   

15.
The application of relevance feedback techniques has been shown to improve retrieval performance for a number of information retrieval tasks. This paper explores incremental relevance feedback for ad hoc Japanese text retrieval; examining, separately and in combination, the utility of term reweighting and query expansion using a probabilistic retrieval model. Retrieval performance is evaluated in terms of standard precision-recall measures, and also using number-to-view graphs. Experimental results, on the standard BMIR-J2 Japanese language retrieval collection, show that both term reweighting and query expansion improve retrieval performance. This is reflected in improvements in both precision and recall, but also a reduction in the average number of documents which must be viewed to find a selected number of relevant items. In particular, using a simple simulation of user searching, incremental application of relevance information is shown to lead to progressively improved retrieval performance and an overall reduction in the number of documents that a user must view to find relevant ones.  相似文献   

16.
Variability is a central concept in software product family development. Variability empowers constructive reuse and facilitates the derivation of different, customer specific products from the product family. If many customer specific requirements can be realised by exploiting the product family variability, the reuse achieved is obviously high. If not, the reuse is low. It is thus important that the variability of the product family is adequately considered when eliciting requirements from the customer. In this paper we sketch the challenges for requirements engineering for product family applications. More precisely we elaborate on the need to communicate the variability of the product family to the customer. We differentiate between variability aspects which are essential for the customer and aspects which are more related to the technical realisation and need thus not be communicated to the customer. Motivated by the successful usage of use cases in single product development we propose use cases as communication medium for the product family variability. We discuss and illustrate which customer relevant variability aspects can be represented with use cases, and for which aspects use cases are not suitable. Moreover we propose extensions to use case diagrams to support an intuitive representation of customer relevant variability aspects.Received: 14 October 2002, Accepted: 8 January 2003, This work was partially funded by the CAFÉ project From Concept to Application in System Family Engineering; Eureka ! 2023 Programme, ITEA Project ip00004 (BMBF, Förderkennzeichen 01 IS 002 C) and the state Nord-Rhein-Westfalia. This paper is a significant extension of the paper Modellierung der Variabilität einer Produktfamilie, [15].  相似文献   

17.
Detection As Multi-Topic Tracking   总被引:1,自引:0,他引:1  
The topic tracking task from TDT is a variant of information filtering tasks that focuses on event-based topics in streams of broadcast news. In this study, we compare tracking to another TDT task, detection, which has the goal of partitioning all arriving news into topics, regardless of whether the topics are of interest to anyone, and even when a new topic appears that had not been previous anticipated. There are clear relationships between the two tasks (under some assumptions, a perfect tracking system could solve the detection problem), but they are evaluated quite differently. We describe the two tasks and discuss their similarities. We show how viewing detection as a form of multi-topic parallel tracking can illuminate the performance tradeoffs of detection over tracking.  相似文献   

18.
Dadurch, dass Literaturnachweise und Publikationen zunehmend in elektronischer und auch vernetzter Form angeboten werden, haben Anzahl und Größe der von wissenschaftlichen Bibliotheken angebotenen Datenbanken erheblich zugenommen. In den verbreiteten Metasuchen über mehrere Datenbanken sind Suchen mit natürlichsprachlichen Suchbegriffen heute der kleinste gemeinsame Nenner. Sie führen aber wegen der bekannten Mängel des booleschen Retrievals häufig zu Treffermengen, die entweder zu speziell oder zu lang und zu unspezifisch sind. Die Technische Fakultät der Universität Bielefeld und die Universitätsbibliothek Bielefeld haben einen auf Fuzzy- Suchlogik basierenden Rechercheassistenten entwickelt, der die Suchanfragen der Benutzer in Teilsuchfragen an die externen Datenbanken zerlegt und die erhaltenen Teilsuchergebnisse in einer nach Relevanz sortierten Liste kumuliert. Es ist möglich, Suchbegriffe zu gewichten und durch Fuzzy- Aggregationsoperatoren zu verknüpfen, die auf der Benutzeroberfläche durch natürlichsprachliche Fuzzy-Quantoren wie möglichst viele, einige u.a. repräsentiert werden. Die Suchparameter werden in der intuitiv bedienbaren einfachen Suche automatisch nach heuristischen Regeln ermittelt, können in einer erweiterten Suche aber auch explizit eingestellt werden. Die Suchmöglichkeiten werden durch Suchen nach ähnlichen Dokumenten und Vorschlagslisten für weitere Suchbegriffe ergänzt. Wir beschreiben die Ausgangssituation, den theoretischen Ansatz, die Benutzeroberfläche und berichten über eine Evalution zur Benutzung und einen Vergleichstest betreffend die Effizienz der Retrievalmethodik.  相似文献   

19.
Kleinbergs HITS algorithm (Kleinberg 1999), which was originally developed in a Web context, tries to infer the authoritativeness of a Web page in relation to a specific query using the structure of a subgraph of the Web graph, which is obtained considering this specific query. Recent applications of this algorithm in contexts far removed from that of Web searching (Bacchin, Ferro and Melucci 2002, Ng et al. 2001) inspired us to study the algorithm in the abstract, independently of its particular applications, trying to mathematically illuminate its behaviour. In the present paper we detail this theoretical analysis. The original work starts from the definition of a revised and more general version of the algorithm, which includes the classic one as a particular case. We perform an analysis of the structure of two particular matrices, essential to studying the behaviour of the algorithm, and we prove the convergence of the algorithm in the most general case, finding the analytic expression of the vectors to which it converges. Then we study the symmetry of the algorithm and prove the equivalence between the existence of symmetry and the independence from the order of execution of some basic operations on initial vectors. Finally, we expound some interesting consequences of our theoretical results.Supported in part by a grant from the Italian National Research Council (CNR) research project Technologies and Services for Enhanced Content Delivery.  相似文献   

20.
Exploiting Hierarchy in Text Categorization   总被引:4,自引:3,他引:1  
With the recent dramatic increase in electronic access to documents, text categorization—the task of assigning topics to a given document—has moved to the center of the information sciences and knowledge management. This article uses the structure that is present in the semantic space of topics in order to improve performance in text categorization: according to their meaning, topics can be grouped together into meta-topics, e.g., gold, silver, and copper are all metals. The proposed architecture matches the hierarchical structure of the topic space, as opposed to a flat model that ignores the structure. It accommodates both single and multiple topic assignments for each document. Its probabilistic interpretation allows its predictions to be combined in a principled way with information from other sources. The first level of the architecture predicts the probabilities of the meta-topic groups. This allows the individual models for each topic on the second level to focus on finer discriminations within the group. Evaluating the performance of a two-level implementation on the Reuters-22173 testbed of newswire articles shows the most significant improvement for rare classes.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号