首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 422 毫秒
1.
In this work the applicability of magnetic bubble memories for the processing of inverted files has been discussed. Four novel models of magnetic bubble memories are presented to demonstrate the storage structures and the data processing. The first model employs an organization of major-minor loops. On the basis of such organization a uniform ladder is formed so that the data can be rearranged by using four operations (global shift, detached shift, exchange and delta exchange). The second model makes use of the on-chip decoder (also known as self-contained magnetic bubble-domain memory chip). For this model a hashing scheme is relied upon to perform the required data operations. The third and fourth models are different combinations of the first two models. The latter two models may provide a relatively high-speed performance as well as a reasonable system complexity.For each model the algorithms of data retrieval, sorting, deletion, insertion and updating are given. Also, a comparison of the four models has been carried out in order to determine the most convenient magnetic bubble memory structure for the processing of inverted files.  相似文献   

2.
Signature files and inverted files are well-known index structures. In this paper we undertake a direct comparison of the two for searching for partially-specified queries in a large lexicon stored in main memory. Using n-grams to index lexicon terms, a bit-sliced signature file can be compressed to a smaller size than an inverted file if each n-gram sets only one bit in the term signature. With a signature width less than half the number of unique n-grams in the lexicon, the signature file method is about as fast as the inverted file method, and significantly smaller. Greater flexibility in memory usage and faster index generation time make signature files appropriate for searching large lexicons or other collections in an environment where memory is at a premium.  相似文献   

3.
A Zipfian model of an automatic bibliographic system is developed using parameters describing the contents of it database and its inverted file. The underlying structure of the Zipf distribution is derived, with particular emphasis on its application to work frequencies, especially with regard to the inverted flies of an automatic bibliographic system. Andrew Booth developed a form of Zipf's law which estimates the number of words of a particular frequency for a given author and text. His formulation has been adopted as the basis of a model of term dispersion in an inverted file system. The model is also distinctive in its consideration of the proliferation of spelling errors in free text, and the inclusion of all searchable elements from the system's inverted file. This model is applied to the National Library of Medicine's MEDLINE. The model carries implications for the determination of database storage requirements, search response time, and search exhaustiveness.  相似文献   

4.
5.
A variety of data structures such as inverted file, multi-lists, quad tree, k-d tree, range tree, polygon tree, quintary tree, multidimensional tries, segment tree, doubly chained tree, the grid file, d-fold tree. super B-tree, Multiple Attribute Tree (MAT), etc. have been studied for multidimensional searching and related problems. Physical data base organization, which is an important application of multidimensional searching, is traditionally and mostly handled by employing inverted file. This study proposes MAT data structure for bibliographic file systems, by illustrating the superiority of MAT data structure over inverted file. Both the methods are compared in terms of preprocessing, storage and query costs. Worst-case complexity analysis of both the methods, for a partial match query, is carried out in two cases: (a) when directory resides in main memory, (b) when directory resides in secondary memory. In both cases, MAT data structure is shown to be more efficient than the inverted file method. Arguments are given to illustrate the superiority of MAT data structure in an average case also. An efficient adaptation of MAT data structure, that exploits the special features of MAT structure and bibliographic files, is proposed for bibliographic file systems. In this adaptation, suitable techniques for fixing and ranking of the attributes for MAT data structure are proposed. Conclusions and proposals for future research are presented.  相似文献   

6.
Many information retrieval systems use the inverted file as indexing structure. The inverted file, however, requires inefficient reorganization when new documents are to be added to an existing collection. Most studies suggest dealing with this problem by sparing free space in an inverted file for incremental updates. In this paper, we propose a run-time statistics-based approach to allocate the spare space. This approach estimates the space requirements in an inverted file using only a little most recent statistical data on space usage and document update request rate. For best indexing speed and space efficiency, the amount of the spare space to be allocated is determined by adaptively balancing the trade-offs between reorganization reduction and space utilization. Experiment results show that the proposed space-sparing approach significantly avoids reorganization in updating an inverted file, and in the meantime, unused free space can be well controlled such that the file access speed is not affected.  相似文献   

7.
基于Web的图像搜索引擎   总被引:1,自引:0,他引:1  
蔡颖 《情报科学》2002,20(10):1075-1077
随着互联网的快速普及,宽带网的全力推行,网络上的图像信息急剧膨胀,多媒体文件越来越多,与此同时,用户对网上图像搜索的要求也在不断增长,在这种背景下,传统的文本搜索方式已经不能满足用的特殊需要,如何能更方便快捷地从网络上找到需要的图像或多媒体文件?于是,各种基于Web的图像搜索引擎应运而生。它们各自以不同的工作方式,使我们对网上图像信息的搜索变得非常简单,本文将从图像搜索引擎的工作原理,搜索方法以及国内外各大图像搜索引擎三个方面作一介绍。  相似文献   

8.
The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%.  相似文献   

9.
Analysis of arithmetic coding for data compression   总被引:1,自引:0,他引:1  
Arithmetic coding, in conjunction with a suitable probabilistic model, can provide nearly optimal data compression. In this article we analyze the effect that the model and the particular implementation of arithmetic coding have on the code length obtained. Periodic scaling is often used in arithmetic coding implementations to reduce time and storage requirements, it also introduces a recency effect which can further affect compression. Our main contribution is introducing the concept of weighted entropy and using it to characterize in an elegant way the effect that periodic scaling has on the code length. We explain why and by how much scaling increases the code length for files with a homogeneous distribution of symbols, and we characterize the reduction in code length due to scaling for files exhibiting locality of reference. We also give a rigorous proof that the coding effects of rounding scaled weights, using integer arithmetic, and encoding end-of-file are negligible.  相似文献   

10.
11.
Ergonomics Abstracts Retrieval System (EARS) is an online bibliographic information search and retrieval system using the hierarchical subject classification of the Ergonomics Abstracts. EARS is designed using an inverted file organization and is implemented on CDC-Cyber. The data base of abstracts is organized using a fixed-length record format, where each logical record corresponds to a variable number of fixed-length physical records. Accordingly, an index for the identification of physical records from the logical records is used. The data base is inverted on a three-level hierarchical classification scheme and postings files are used for data base inversion. The data base is accessed after selectively traversing a 4-layer structure of indexes and postings files. EARS provides facilities to perform combinations of searches, limited searches, and certain editing functions. The system is currently used extensively by the Western New York Human Factors research community. The logical and physical designs of EARS, its interactive operational features, and its current expansions are described in this paper.  相似文献   

12.
覃静 《大众科技》2014,(3):153-154,164
针对高校中大量存放和管理学生档案,分析了目前档案在管理效率、存放安全、更新效率等方面的主要实施方法,对比分析了不同档案管理方法之间的差异和优缺点。归纳总结了学生档案管理中存在的主要问题,提出了学生档案管理改革策略和原则,并根据院校工作的特点,制定了详细的学生档案管理实施方案。  相似文献   

13.
分析了目前主流的存储技术SAN、NAS、DAS和iSCSI技术等,在不同的时期对于不同的存储备份需求都会有不同的技术来实现,通过几种不同存储技术的比较,看到未来存储技术发展的趋势。考虑到未来应用系统的发展,以后可以灵活地加入服务器,扩展为SAN存储区域网,提高存储效率。  相似文献   

14.
提出一种基于游程统计和Walsh谱能量分布的调色板图像隐写分析方法,利用调色板颜色的奇偶值(parity value,P值)构造P值图像,通过游程统计和Walsh变换提取出46维特征,并用校准技术削弱载体图像差异对分析统计量的干扰.实验表明,该方法适用于EzStego、分量和以及最佳奇偶隐写等调色板图像隐写算法,比奇异颜色分析准确.  相似文献   

15.
Bug reports are an essential part of a software project's life cycle since resolving them improves the project's quality. When a new bug report is received, developers usually need to reproduce the bug and perform code review to locate the bug and assign it to be fixed. However, the huge number of bug reports and the increasing size of software projects make this process tedious and time-consuming. To solve this issue, bug localization techniques try to rank all the source files of a project with respect to how likely they are to contain a bug. This process reduces the search space of source files and helps developers to find relevant source files quicker. In this paper, we propose a multi-component bug localization approach that leverages different textual properties of bug reports and source files as well as the relations between previously fixed bug reports and a newly received one. Our approach uses information retrieval, textual matching, stack trace analysis, and multi-label classification to improve the performance of bug localization. We evaluate the performance of the proposed approach on three open source software projects (i.e., AspectJ, SWT, and ZXing) and the results show that it can rank appropriate source files for more than 52% of bugs by recommending only one source file and 78% by recommending ten files. It also improves the MRR and MAP values compared to several existing state-of-the-art bug localization approaches.  相似文献   

16.
张陈俊  章恒全  陈其勇  龚雅云 《资源科学》2015,37(11):2228-2239
利用1998-2012年中国省际面板数据模型,整体和分组检验不同类别用水量与经济增长的关系。分析发现,用水量与经济增长的关系具有多种表现形态,与地区和用水类别密切相关,全国组别的总用水和工业用水、东部组别的工业用水、中部组别的工业用水、西部组别的总用水与经济增长之间呈现倒“U”型形态;各省份当前用水量所处现状存在较大差异,需要警惕用水量下降后再次上升的“反弹效应”。同时,本文对31个省份的时间序列进行回归检验,发现倒“U”型形态普遍存在,且发达地区拐点值大于欠发达地区;另外,对部分年份的截面数据进行非参数估计,与参数估计结果具有差异性。因此,各地区对待不同的用水类别,需要制定差异化的政策以促使用水量保持稳定或下降,避免出现“反弹效应”;加大对欠发达地区的政策倾斜,支持帮助其跨过拐点实现用水量的下降。  相似文献   

17.
bidirectional delta file is a novel concept, introduced in this paper, for a two way delta file. Previous work focuses on single way differential compression called forwards and backwards delta files. Here we suggest to efficiently combine them into a single file so that the combined file is smaller than the combination of the two individual ones. Given the bidirectional delta file of two files S and T and the original file S, one can decode it in order to produce T. The same bidirectional delta file is used together with the file T in order to reconstruct S. This paper presents two main strategies for producing an efficient bidirectional delta file in terms of the memory storage it requires; a quadratic time, optimal, dynamic programming algorithm, and a linear time, greedy algorithm. Although the dynamic programming algorithm often produces better results than the greedy algorithm, it is impractical for large files, and it is only used for theoretical comparisons. Experiments between the implemented algorithms and the traditional way of using both forwards and backwards delta files are presented, comparing their processing time and their compression performance. These experiments show memory storage savings of about 25% using this bidirectional delta approach as compared to the compressed delta file constructed using the traditional way, while preserving approximately the same processing time for decoding.  相似文献   

18.
科技档案是国家机构、企事业单位、社会组织或个人在从事科研、生产、基本建设及管理活动中所形成的对国家和社会具有保存价值,应当归档保存的科技文件资料,是科学技术活动的记录,科技成果的载体,具有依据和凭证的作用,也是国家的宝贵财富。因此,探究科技档案的现代化管理具有非常重要的意义。文章拟从时下科技档案的内涵、现状入手,进一步探讨科技档案存在的问题,及实施现代化管理的途径,旨在为国家的科技档案管理略尽绵帛之力。  相似文献   

19.
郭清华 《科教文汇》2014,(8):203-204
随着信息技术的快速发展,产生了数量庞大的电子档案,这里的电子档案是以代码形式存贮于特定介质上的档案,又称为机读或数字式档案。电子文件以其无可比拟的优势,给纸质文件带来了极大冲击。处于知识前沿的高等学校的会计电算化已广泛普及。新技术的应用为社会学校档案管理带来极大的便利。在中职学校档案管理工作中,如何根据中职学校的发展需求,由纸质档案转向电子档案管理,跟上时代发展的步伐,已成为急需探讨的重要问题。因此,本文主要探讨中职学校电子档案管理的策略。  相似文献   

20.
被入侵或遭受攻击后的计算机系统,其磁盘数据往往会遭到篡改、删除等恶意修改,而这些数据通常是非常宝贵的,因此数据恢复技术尤为重要。现有的数据恢复技术在很多方面存在不足,这是因为主流文件系统其设计本身并不利于数据恢复。针对上述问题,发现日志式文件系统设计有利于实施高效的数据恢复算法,从理论上分析了它有利于进行数据恢复的原因,设计了基于日志式文件系统的数据恢复算法模型,并在开发的日志式文件系统模拟器和真实环境中分别进行了算法测试。实验表明,日志式文件系统可以容易地实现恢复迅速、对系统性能影响轻微、无需额外存储机制的数据恢复算法。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号