首页 | 本学科首页   官方微博 | 高级检索  
     检索      

倒排文档压缩技巧d-gaps的改进——文档标识号重置法
引用本文:张爱红.倒排文档压缩技巧d-gaps的改进——文档标识号重置法[J].现代图书情报技术,2004,20(8):61-65.
作者姓名:张爱红
作者单位:四川大学信息管理系,成都,610064
摘    要:倒排文档是信息检索系统中最普遍使用的索引机制,而索引文件的压缩能大大提高检索速度和节约磁盘空间。倒排文件压缩的传统做法是文档(标识号)间距法(d-gaps)。然而,剧烈变化的间距值并不能被著名的前缀自由代码有效编码压缩。为了使间距值得到有效的压缩,本文设计了一个文档标识号重置法。模拟试验表明能更有效压缩d-gaps倒排文档。

关 键 词:倒排文档  文档(标识号)间距法  文档标识号重置法  旅行商问题  贪心算法  最大生成树
收稿时间:2004-03-08
修稿时间:2004年3月8日

Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment
Zhang Aihong.Improve the d-gaps Technique for Inverted File Compression: Document Identifier Reassignment[J].New Technology of Library and Information Service,2004,20(8):61-65.
Authors:Zhang Aihong
Institution:(Department of Information Management, Sichuan University, Chengdu 610064,China)
Abstract:The inverted file is the most popular indexing mechanism in an information retrieval system. Compressing an inverted file can greatly improve document search rate and save disk space. Traditionally, the d-gaps technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. In this paper, a document identifier reassignment algorithm is proposed to reduce the gap values. Simulation results show that the average gap values of the inverted files can be reduced effectively.
Keywords:Inverted file  d-gaps  Document identifier reassignment  TSP (Traveling Salesman Problem)  The greedy algorithm  MaxST
本文献已被 CNKI 维普 万方数据 等数据库收录!
点击此处可从《现代图书情报技术》浏览原始摘要信息
点击此处可从《现代图书情报技术》下载免费的PDF全文
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号