共查询到20条相似文献,搜索用时 515 毫秒
1.
Modeling context through domain ontologies 总被引:1,自引:0,他引:1
Nathalie Hernandez Josiane Mothe Claude Chrisment Daniel Egret 《Information Retrieval》2007,10(2):143-172
Traditional information retrieval systems aim at satisfying most users for most of their searches, leaving aside the context
in which the search takes place. We propose to model two main aspects of context: The themes of the user's information need
and the specific data the user is looking for to achieve the task that has motivated his search. Both aspects are modeled
by means of ontologies. Documents are semantically indexed according to the context representation and the user accesses information
by browsing the ontologies. The model has been applied to a case study that has shown the added value of such a semantic representation
of context.
相似文献
Daniel EgretEmail: |
2.
Smoothing of document language models is critical in language modeling approaches to information retrieval. In this paper,
we present a novel way of smoothing document language models based on propagating term counts probabilistically in a graph
of documents. A key difference between our approach and previous approaches is that our smoothing algorithm can iteratively
propagate counts and achieve smoothing with remotely related documents. Evaluation results on several TREC data sets show that the proposed method significantly outperforms the
simple collection-based smoothing method. Compared with those other smoothing methods that also exploit local corpus structures,
our method is especially effective in improving precision in top-ranked documents through “filling in” missing query terms
in relevant documents, which is attractive since most users only pay attention to the top-ranked documents in search engine
applications.
相似文献
ChengXiang ZhaiEmail: |
3.
Fotis Lazarinis Jesús Vilares John Tait Efthimis N. Efthimiadis 《Information Retrieval》2009,12(3):230-250
With increasingly higher numbers of non-English language web searchers the problems of efficient handling of non-English Web
documents and user queries are becoming major issues for search engines. The main aim of this review paper is to make researchers
aware of the existing problems in monolingual non-English Web retrieval by providing an overview of open issues. A significant
number of papers are reviewed and the research issues investigated in these studies are categorized in order to identify the
research questions and solutions proposed in these papers. Further research is proposed at the end of each section.
相似文献
Efthimis N. EfthimiadisEmail: |
4.
We consider the following autocompletion search scenario: imagine a user of a search engine typing a query; then with every
keystroke display those completions of the last query word that would lead to the best hits, and also display the best such
hits. The following problem is at the core of this feature: for a fixed document collection, given a set D of documents, and an alphabetical range W of words, compute the set of all word-in-document pairs (w, d) from the collection such that w ∈ W and d ∈ D. We present a new data structure with the help of which such autocompletion queries can be processed, on the average, in
time linear in the input plus output size, independent of the size of the underlying document collection. At the same time,
our data structure uses no more space than an inverted index. Actual query processing times on a large test collection correlate
almost perfectly with our theoretical bound.
相似文献
Ingmar WeberEmail: |
5.
Norbert Fuhr 《Information Retrieval》2008,11(3):251-265
The classical Probability Ranking Principle (PRP) forms the theoretical basis for probabilistic Information Retrieval (IR)
models, which are dominating IR theory since about 20 years. However, the assumptions underlying the PRP often do not hold,
and its view is too narrow for interactive information retrieval (IIR). In this article, a new theoretical framework for interactive
retrieval is proposed: The basic idea is that during IIR, a user moves between situations. In each situation, the system presents
to the user a list of choices, about which s/he has to decide, and the first positive decision moves the user to a new situation.
Each choice is associated with a number of cost and probability parameters. Based on these parameters, an optimum ordering
of the choices can the derived—the PRP for IIR. The relationship of this rule to the classical PRP is described, and issues
of further research are pointed out.
相似文献
Norbert FuhrEmail: |
6.
On knowledge-poor methods for person name matching and lemmatization for highly inflectional languages 总被引:1,自引:1,他引:0
Web person search is one of the most common activities of Internet users. Recently, a vast amount of work on applying various
NLP techniques for person name disambiguation in large web document collections has been reported, where the main focus was
on English and few other major languages. This article reports on knowledge-poor methods for tackling person name matching
and lemmatization in Polish, a highly inflectional language with complex person name declension paradigm. These methods apply
mainly well-established string distance metrics, some new variants thereof, automatically acquired simple suffix-based lemmatization
patterns and some combinations of the aforementioned techniques. Furthermore, we also carried out some initial experiments
on deploying techniques that utilize the context, in which person names appear. Results of numerous experiments are presented.
The evaluation carried out on a data set extracted from a corpus of on-line news articles revealed that achieving lemmatization
accuracy figures greater than 90% seems to be difficult, whereas combining string distance metrics with suffix-based patterns
results in 97.6–99% accuracy for the name matching task. Interestingly, no significant additional gain could be achieved through
integrating some basic techniques, which try to exploit the local context the names appear in. Although our explorations were
focused on Polish, we believe that the work presented in this article constitutes practical guidelines for tackling the same
problem for other highly inflectional languages with similar phenomena.
相似文献
Marcin SydowEmail: |
7.
Participatory archive: towards decentralised curation,radical user orientation,and broader contextualisation of records management 总被引:2,自引:2,他引:0
Isto Huvila 《Archival Science》2008,8(1):15-36
User perspective and user studies have received noticeably little practical attention in archives and archival science. The
purpose of this article is to address the issues of communication and user participation in archival contexts. Two action
research projects-based digital archives are discussed. The insights gained during the research and development work are used
to formulate a new approach to a participatory archive. In spite of the historical nature of the archives discussed, the suggested ways of interacting with an archive are not specific
to historical records. The fundamental characteristics of the proposed approach are decentralised curation, radical user orientation,
and contextualisation of both records and the entire archival process.
相似文献
Isto HuvilaEmail: |
8.
Andrew MacFarlane 《Information Retrieval》2009,12(2):162-178
Understanding of mathematics is needed to underpin the process of search, either explicitly with Exact Match (Boolean logic,
adjacency) or implicitly with Best match natural language search. In this paper we outline some pedagogical challenges in
teaching mathematics for information retrieval (IR) to postgraduate information science students. The aim is to take these
challenges either found by experience or in the literature, to identify both theoretical and practical ideas in order to improve
the delivery of the material and positively affect the learning of the target audience by using a tutorial style of teaching.
Results show that there is evidence to support the notion that a more pro-active style of teaching using tutorials yield benefits
both in terms of assessment results and student satisfaction.
相似文献
Andrew MacFarlaneEmail: |
9.
Content-oriented XML retrieval approaches aim at a more focused retrieval strategy: Instead of retrieving whole documents, document components that are exhaustive to the information need while at the same time being as specific as possible should be retrieved. In this article, we show that the evaluation methods developed for standard retrieval must be modified in order to deal with the structure of XML documents. More precisely, the size and overlap of document components must be taken into account. For this purpose, we propose a new effectiveness metric based on the definition of a concept space defined upon the notions of exhaustiveness and specificity of a search result. We compare the results of this new metric by the results obtained with the official metric used in INEX, the evaluation initiative for content-oriented XML retrieval.
相似文献
Gabriella KazaiEmail: |
10.
To put an end to the large copyright trade deficit, both Chinese government agencies and publishing houses have been striving
for entering the international publication market. The article analyzes the background of the going-global strategy, and sums
up the performance of both Chinese administrations and publishers.
相似文献
Qing Fang (Corresponding author)Email: |
11.
Jacob Soll 《Archival Science》2007,7(4):331-342
This article examines the archival methods developed by Colbert to train his son in state administration. Based on Colbert’s
correspondence with his son, it reveals the practices Colbert thought necessary to collect and manage information in his state
encyclopedic archive during the last half of the 17th century.
相似文献
Jacob SollEmail: |
12.
A summary overview of the children’s and young adult publishing industry in China with a focus on the size of the market,
ten major publishing houses, copyright and trends. Special emphasis has been placed on specific transaction for the sale of
translation rights from German language publishers to China and minimal activities of German rights sold to Chinese publishers.
相似文献
Jing BartzEmail: |
13.
On rank-based effectiveness measures and optimization 总被引:1,自引:0,他引:1
Many current retrieval models and scoring functions contain free parameters which need to be set—ideally, optimized. The process
of optimization normally involves some training corpus of the usual document-query-relevance judgement type, and some choice
of measure that is to be optimized. The paper proposes a way to think about the process of exploring the space of parameter
values, and how moving around in this space might be expected to affect different measures. One result, concerning local optima,
is demonstrated for a range of rank-based evaluation measures.
相似文献
Hugo ZaragozaEmail: |
14.
Multilingual information retrieval is generally understood to mean the retrieval of relevant information in multiple target
languages in response to a user query in a single source language. In a multilingual federated search environment, different
information sources contain documents in different languages. A general search strategy in multilingual federated search environments
is to translate the user query to each language of the information sources and run a monolingual search in each information
source. It is then necessary to obtain a single ranked document list by merging the individual ranked lists from the information
sources that are in different languages. This is known as the results merging problem for multilingual information retrieval.
Previous research has shown that the simple approach of normalizing source-specific document scores is not effective. On the
other side, a more effective merging method was proposed to download and translate all retrieved documents into the source
language and generate the final ranked list by running a monolingual search in the search client. The latter method is more
effective but is associated with a large amount of online communication and computation costs. This paper proposes an effective
and efficient approach for the results merging task of multilingual ranked lists. Particularly, it downloads only a small
number of documents from the individual ranked lists of each user query to calculate comparable document scores by utilizing
both the query-based translation method and the document-based translation method. Then, query-specific and source-specific
transformation models can be trained for individual ranked lists by using the information of these downloaded documents. These
transformation models are used to estimate comparable document scores for all retrieved documents and thus the documents can
be sorted into a final ranked list. This merging approach is efficient as only a subset of the retrieved documents are downloaded
and translated online. Furthermore, an extensive set of experiments on the Cross-Language Evaluation Forum (CLEF) () data has demonstrated the effectiveness of the query-specific and source-specific results merging algorithm against other
alternatives. The new research in this paper proposes different variants of the query-specific and source-specific results
merging algorithm with different transformation models. This paper also provides thorough experimental results as well as
detailed analysis. All of the work substantially extends the preliminary research in (Si and Callan, in: Peters (ed.) Results
of the cross-language evaluation forum-CLEF 2005, 2005).
相似文献
Hao YuanEmail: |
15.
Andy Weissberg 《Publishing Research Quarterly》2008,24(4):255-260
This article analyzes current industry practices toward the identification of digital book content. It highlights key technology
trends, workflow considerations and supply chain behaviors, and examines the implications of these trends and behaviors on
the production, discoverability, purchasing and consumption of digital book products.
相似文献
Andy WeissbergEmail: |
16.
Sandeep Chaufla 《Publishing Research Quarterly》2008,24(3):187-201
A review and analysis of the rules and regulations including the tax aspects of making an investment in India is presented.
The full range from Foreign Direct Investment to different forms of doing business with specific examples from the publishing
industry is explored to help understand current policies and regulations.
相似文献
Sandeep ChauflaEmail: Email: |
17.
18.
Jun Wang Stephen Robertson Arjen P. de Vries Marcel J. T. Reinders 《Information Retrieval》2008,11(6):477-497
Collaborative filtering is concerned with making recommendations about items to users. Most formulations of the problem are
specifically designed for predicting user ratings, assuming past data of explicit user ratings is available. However, in practice
we may only have implicit evidence of user preference; and furthermore, a better view of the task is of generating a top-N
list of items that the user is most likely to like. In this regard, we argue that collaborative filtering can be directly
cast as a relevance ranking problem. We begin with the classic Probability Ranking Principle of information retrieval, proposing a probabilistic
item ranking framework. In the framework, we derive two different ranking models, showing that despite their common origin,
different factorizations reflect two distinctive ways to approach item ranking. For the model estimations, we limit our discussions
to implicit user preference data, and adopt an approximation method introduced in the classic text retrieval model (i.e. the
Okapi BM25 formula) to effectively decouple frequency counts and presence/absence counts in the preference data. Furthermore,
we extend the basic formula by proposing the Bayesian inference to estimate the probability of relevance (and non-relevance),
which largely alleviates the data sparsity problem. Apart from a theoretical contribution, our experiments on real data sets
demonstrate that the proposed methods perform significantly better than other strong baselines.
相似文献
Marcel J. T. ReindersEmail: |
19.
Fernando Diaz 《Information Retrieval》2007,10(6):531-562
We adapt the cluster hypothesis for score-based information retrieval by claiming that closely related documents should have
similar scores. Given a retrieval from an arbitrary system, we describe an algorithm which directly optimizes this objective
by adjusting retrieval scores so that topically related documents receive similar scores. We refer to this process as score
regularization. Because score regularization operates on retrieval scores, regardless of their origin, we can apply the technique
to arbitrary initial retrieval rankings. Document rankings derived from regularized scores, when compared to rankings derived
from un-regularized scores, consistently and significantly result in improved performance given a variety of baseline retrieval
algorithms. We also present several proofs demonstrating that regularization generalizes methods such as pseudo-relevance
feedback, document expansion, and cluster-based retrieval. Because of these strong empirical and theoretical results, we argue
for the adoption of score regularization as general design principle or post-processing step for information retrieval systems.
相似文献
Fernando DiazEmail: |
20.
Oren Kurland 《Information Retrieval》2009,12(4):437-460
To obtain high precision at top ranks by a search performed in response to a query, researchers have proposed a cluster-based
re-ranking paradigm: clustering an initial list of documents that are the most highly ranked by some initial search, and using
information induced from these (often called) query-specific clusters for re-ranking the list. However, results concerning the effectiveness of various automatic cluster-based re-ranking methods have been inconclusive. We show that using query-specific clusters for automatic re-ranking
of top-retrieved documents is effective with several methods in which clusters play different roles, among which is the smoothing of document language models. We do so by adapting previously-proposed cluster-based retrieval approaches, which are based on (static) query-independent
clusters for ranking all documents in a corpus, to the re-ranking setting wherein clusters are query-specific. The best performing
method that we develop outperforms both the initial document-based ranking and some previously proposed cluster-based re-ranking
approaches; furthermore, this algorithm consistently outperforms a state-of-the-art pseudo-feedback-based approach. In further
exploration we study the performance of cluster-based smoothing methods for re-ranking with various (soft and hard) clustering
algorithms, and demonstrate the importance of clusters in providing context from the initial list through a comparison to
using single documents to this end.
相似文献
Oren KurlandEmail: |