Hybrid index maintenance for contiguous inverted lists |
| |
Authors: | Stefan Büttcher Charles L A Clarke |
| |
Institution: | (1) Google Inc., Mountain View, CA, USA;(2) University of Waterloo, Waterloo, ON, Canada |
| |
Abstract: | Index maintenance strategies employed by dynamic text retrieval systems based on inverted files can be divided into two categories:
merge-based and in-place update strategies. Within each category, individual update policies can be distinguished based on
whether they store their on-disk posting lists in a contiguous or in a discontiguous fashion. Contiguous inverted lists, in
general, lead to higher query performance, by minimizing the disk seek overhead at query time, while discontiguous inverted
lists lead to higher update performance, requiring less effort during index maintenance operations. In this paper, we focus
on retrieval systems with high query load, where the on-disk posting lists have to be stored in a contiguous fashion at all
times. We discuss a combination of re-merge and in-place index update, called Hybrid Immediate Merge. The method performs strictly better than the re-merge baseline policy used in our experiments, as it leads to the same query
performance, but substantially better update performance. The actual time savings achievable depend on the size of the text
collection being indexed; a larger collection results in greater savings. In our experiments, variations of Hybrid Immediate Merge were able to reduce the total index update overhead by up to 73% compared to the re-merge baseline.
|
| |
Keywords: | Text retrieval Search engines Index maintenance |
本文献已被 SpringerLink 等数据库收录! |
|