Learning to rank with (a lot of) word features |
| |
Authors: | Bing Bai Jason Weston David Grangier Ronan Collobert Kunihiko Sadamasa Yanjun Qi Olivier Chapelle Kilian Weinberger |
| |
Institution: | (1) NEC Labs America, Princeton, NJ, USA;(2) Yahoo! Research, Santa Clara, CA, USA |
| |
Abstract: | In this article we present Supervised Semantic Indexing which defines a class of nonlinear (quadratic) models that are discriminatively
trained to directly map from the word content in a query-document or document-document pair to a ranking score. Like Latent
Semantic Indexing (LSI), our models take account of correlations between words (synonymy, polysemy). However, unlike LSI our
models are trained from a supervised signal directly on the ranking task of interest, which we argue is the reason for our
superior results. As the query and target texts are modeled separately, our approach is easily generalized to different retrieval
tasks, such as cross-language retrieval or online advertising placement. Dealing with models on all pairs of words features
is computationally challenging. We propose several improvements to our basic model for addressing this issue, including low
rank (but diagonal preserving) representations, correlated feature hashing and sparsification. We provide an empirical study
of all these methods on retrieval tasks based on Wikipedia documents as well as an Internet advertisement task. We obtain
state-of-the-art performance while providing realistically scalable methods. |
| |
Keywords: | |
本文献已被 SpringerLink 等数据库收录! |
|