首页 | 本学科首页   官方微博 | 高级检索  
     检索      


Hybrid index maintenance for contiguous inverted lists
Authors:Stefan Büttcher  Charles L A Clarke
Institution:(1) Google Inc., Mountain View, CA, USA;(2) University of Waterloo, Waterloo, ON, Canada
Abstract:Index maintenance strategies employed by dynamic text retrieval systems based on inverted files can be divided into two categories: merge-based and in-place update strategies. Within each category, individual update policies can be distinguished based on whether they store their on-disk posting lists in a contiguous or in a discontiguous fashion. Contiguous inverted lists, in general, lead to higher query performance, by minimizing the disk seek overhead at query time, while discontiguous inverted lists lead to higher update performance, requiring less effort during index maintenance operations. In this paper, we focus on retrieval systems with high query load, where the on-disk posting lists have to be stored in a contiguous fashion at all times. We discuss a combination of re-merge and in-place index update, called Hybrid Immediate Merge. The method performs strictly better than the re-merge baseline policy used in our experiments, as it leads to the same query performance, but substantially better update performance. The actual time savings achievable depend on the size of the text collection being indexed; a larger collection results in greater savings. In our experiments, variations of Hybrid Immediate Merge were able to reduce the total index update overhead by up to 73% compared to the re-merge baseline.
Contact Information Stefan BüttcherEmail:
Keywords:Text retrieval  Search engines  Index maintenance
本文献已被 SpringerLink 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号