期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

1.

Decoding multi-click search behavior based on marginal utility

Hai-Tao?Yu Email author Adam ?Jatowt Roi?Blanco Hideo?Joho Joemon?M.?Jose 《Information Retrieval》2017,20(1):25-52

相似文献

2.

Mining unstructured content for recommender systems: an ensemble approach

Marcelo G. Manzato Marcos A. Domingues Arthur C. Fortes Camila V. Sundermann Rafael M. D’Addio Merley S. Conrado Solange O. Rezende Maria G. C. Pimentel 《Information Retrieval》2016,19(4):378-415

相似文献

3.

“If there are no records,there is no narrative”: the social justice impact of records of Scottish care-leavers

Heather MacNeil Wendy Duff Alicia Dotiwalla Karolina Zuchniak 《Archival Science》2018,18(1):1-28

In 2004, the Scottish Parliament commissioned an independent review of abuse in children’s residential establishments between 1950 and 1995. In 2007, the review’s findings were published in a report entitled Historical Abuse Systemic Review: Residential Schools and Children’s Homes in Scotland 1950 to 1995, also known as the Shaw Report. In this article, the Shaw Report provides the jumping off point for a case study of the social justice impact of records. Drawing on secondary literature, interviews, and care-related records, the study identifies narratives that speak to the social justice impact of care records on care-leavers seeking access to them; it also assesses the potential of the surviving administrative records to serve as a foundation on which to construct historical narratives that speak more generally to the experience of children in residential care. 相似文献

4.

政府电子文件协同管理:美国经验及其启示 总被引：1，自引：0，他引：1

白文琳安小米《档案学通讯》2020,(4):103-112

加强政府电子文件协同管理是促进政府电子文件高效管理和价值实现的有效路径。采用协同创新管理理论框架对近五年美国国家档案与文件署发布的电子文件管理相关政策进行文本分析,并对五个部门工作人员进行访谈,深入了解政策现状,归纳总结出美国在政府电子文高效协同管理方面的措施,揭示其在目标协同、主体协同、客体协同、过程协同、要素协同五个方面的协同经验。最后,结合我国实践需求,提出了借鉴五个方面经验,促进政府电子文件协同管理路径高效的启示。相似文献

5.

Harry Potter,Riding the Bullet and the Future of Books: Key Issues in the Anglophone Book Business

Iain Stevenson 《Publishing Research Quarterly》2008,24(4):277-284

This paper reviews the current status of the Anglophone (Anglo-American) publishing business and draws some comparisons with publishing in other languages. It then critically reviews the impact of the Harry Potter phenomenon and the questionable progress of e-books in the trade sector, using the example of Stephen King’s Riding the Bullet. It also comments on Amazon’s introduction of the Kindle e-book reader. 相似文献

6.

Waves: a fast multi-tier top-<Emphasis Type="Italic">k</Emphasis> query processing algorithm

Caio Moura Daoud Edleno Silva de Moura David Fernandes Altigran Soares da Silva Cristian Rossi Andre Carvalho 《Information Retrieval》2017,20(3):292-316

In this paper, we present Waves, a novel document-at-a-time algorithm for fast computing of top-k query results in search systems. The Waves algorithm uses multi-tier indexes for processing queries. It performs successive tentative evaluations of results which we call waves. Each wave traverses the index, starting from a specific tier level i. Each wave i may insert only those documents that occur in that tier level into the answer. After processing a wave, the algorithm checks whether the answer achieved might be changed by successive waves or not. A new wave is started only if it has a chance of changing the top-k scores. We show through experiments that such lazy query processing strategy results in smaller query processing times when compared to previous approaches proposed in the literature. We present experiments to compare Waves’ performance to the state-of-the-art document-at-a-time query processing methods that preserve top-k results and show scenarios where the method can be a good alternative algorithm for computing top-k results. 相似文献

7.

A unified score propagation model for web spam demotion algorithm

Xu Zhuang Yan Zhu Chin-Chen Chang Qiang Peng Faisal Khurshid 《Information Retrieval》2017,20(6):547-574

Web spam pages exploit the biases of search engine algorithms to get higher than their deserved rankings in search results by using several types of spamming techniques. Many web spam demotion algorithms have been developed to combat spam via the use of the web link structure, from which the goodness or badness score of each web page is evaluated. Those scores are then used to identify spam pages or punish their rankings in search engine results. However, most of the published spam demotion algorithms differ from their base models by only very limited improvements and still suffer from some common score manipulation methods. The lack of a general framework for this field makes the task of designing high-performance spam demotion algorithms very inefficient. In this paper, we propose a unified score propagation model for web spam demotion algorithms by abstracting the score propagation process of relevant models with a forward score propagation function and a backward score propagation function, each of which can further be expressed as three sub-functions: a splitting function, an accepting function and a combination function. On the basis of the proposed model, we develop two new web spam demotion algorithms named Supervised Forward and Backward score Ranking (SFBR) and Unsupervised Forward and Backward score Ranking (UFBR). Our experiments, conducted on three large-scale public datasets, show that (1) SFBR is very robust and apparently outperforms other algorithms and (2) UFBR can obtain results comparable to some well-known supervised algorithms in the spam demotion task even if the UFBR is unsupervised. 相似文献

8.

Using word embeddings in Twitter election classification

Xiao Yang Craig Macdonald Iadh Ounis 《Information Retrieval》2018,21(2-3):183-207

Word embeddings and convolutional neural networks (CNN) have attracted extensive attention in various classification tasks for Twitter, e.g. sentiment classification. However, the effect of the configuration used to generate the word embeddings on the classification performance has not been studied in the existing literature. In this paper, using a Twitter election classification task that aims to detect election-related tweets, we investigate the impact of the background dataset used to train the embedding models, as well as the parameters of the word embedding training process, namely the context window size, the dimensionality and the number of negative samples, on the attained classification performance. By comparing the classification results of word embedding models that have been trained using different background corpora (e.g. Wikipedia articles and Twitter microposts), we show that the background data should align with the Twitter classification dataset both in data type and time period to achieve significantly better performance compared to baselines such as SVM with TF-IDF. Moreover, by evaluating the results of word embedding models trained using various context window sizes and dimensionalities, we find that large context window and dimension sizes are preferable to improve the performance. However, the number of negative samples parameter does not significantly affect the performance of the CNN classifiers. Our experimental results also show that choosing the correct word embedding model for use with CNN leads to statistically significant improvements over various baselines such as random, SVM with TF-IDF and SVM with word embeddings. Finally, for out-of-vocabulary (OOV) words that are not available in the learned word embedding models, we show that a simple OOV strategy to randomly initialise the OOV words without any prior knowledge is sufficient to attain a good classification performance among the current OOV strategies (e.g. a random initialisation using statistics of the pre-trained word embedding models). 相似文献

9.

Document retrieval on repetitive string collections

Travis Gagie Aleksi Hartikainen Kalle Karhu Juha Kärkkäinen Gonzalo Navarro Simon J. Puglisi Jouni Sirén 《Information Retrieval》2017,20(3):253-291

Most of the fastest-growing string collections today are repetitive, that is, most of the constituent documents are similar to many others. As these collections keep growing, a key approach to handling them is to exploit their repetitiveness, which can reduce their space usage by orders of magnitude. We study the problem of indexing repetitive string collections in order to perform efficient document retrieval operations on them. Document retrieval problems are routinely solved by search engines on large natural language collections, but the techniques are less developed on generic string collections. The case of repetitive string collections is even less understood, and there are very few existing solutions. We develop two novel ideas, interleaved LCPs and precomputed document lists, that yield highly compressed indexes solving the problem of document listing (find all the documents where a string appears), top-k document retrieval (find the k documents where a string appears most often), and document counting (count the number of documents where a string appears). We also show that a classical data structure supporting the latter query becomes highly compressible on repetitive data. Finally, we show how the tools we developed can be combined to solve ranked conjunctive and disjunctive multi-term queries under the simple \({\textsf{tf}}{\textsf{-}}{\textsf{idf}}\) model of relevance. We thoroughly evaluate the resulting techniques in various real-life repetitiveness scenarios, and recommend the best choices for each case. 相似文献

10.

Topic set size design

Tetsuya Sakai 《Information Retrieval》2016,19(3):256-283

Traditional pooling-based information retrieval (IR) test collections typically have \(n= 50\)–100 topics, but it is difficult for an IR researcher to say why the topic set size should really be n. The present study provides details on principled ways to determine the number of topics for a test collection to be built, based on a specific set of statistical requirements. We employ Nagata’s three sample size design techniques, which are based on the paired t test, one-way ANOVA, and confidence intervals, respectively. These topic set size design methods require topic-by-run score matrices from past test collections for the purpose of estimating the within-system population variance for a particular evaluation measure. While the previous work of Sakai incorrectly used estimates of the total variances, here we use the correct estimates of the within-system variances, which yield slightly smaller topic set sizes than those reported previously by Sakai. Moreover, this study provides a comparison across the three methods. Our conclusions nevertheless echo those of Sakai: as different evaluation measures can have vastly different within-system variances, they require substantially different topic set sizes under the same set of statistical requirements; by analysing the tradeoff between the topic set size and the pool depth for a particular evaluation measure in advance, researchers can build statistically reliable yet highly economical test collections. 相似文献

11.

<Emphasis Type="Italic">EveTAR</Emphasis>: building a large-scale multi-task test collection over Arabic tweets

Maram?Hasanain Email author Reem?Suwaileh Tamer?Elsayed Mucahid?Kutlu Hind?Almerekhi 《Information Retrieval》2018,21(4):307-336

This article introduces a new language-independent approach for creating a large-scale high-quality test collection of tweets that supports multiple information retrieval (IR) tasks without running a shared-task campaign. The adopted approach (demonstrated over Arabic tweets) designs the collection around significant (i.e., popular) events, which enables the development of topics that represent frequent information needs of Twitter users for which rich content exists. That inherently facilitates the support of multiple tasks that generally revolve around events, namely event detection, ad-hoc search, timeline generation, and real-time summarization. The key highlights of the approach include diversifying the judgment pool via interactive search and multiple manually-crafted queries per topic, collecting high-quality annotations via crowd-workers for relevancy and in-house annotators for novelty, filtering out low-agreement topics and inaccessible tweets, and providing multiple subsets of the collection for better availability. Applying our methodology on Arabic tweets resulted in EveTAR, the first freely-available tweet test collection for multiple IR tasks. EveTAR includes a crawl of 355M Arabic tweets and covers 50 significant events for which about 62K tweets were judged with substantial average inter-annotator agreement (Kappa value of 0.71). We demonstrate the usability of EveTAR by evaluating existing algorithms in the respective tasks. Results indicate that the new collection can support reliable ranking of IR systems that is comparable to similar TREC collections, while providing strong baseline results for future studies over Arabic tweets. 相似文献

12.

Efficient distributed selective search

Yubin Kim Jamie Callan J. Shane Culpepper Alistair Moffat 《Information Retrieval》2017,20(3):221-252

Simulation and analysis have shown that selective search can reduce the cost of large-scale distributed information retrieval. By partitioning the collection into small topical shards, and then using a resource ranking algorithm to choose a subset of shards to search for each query, fewer postings are evaluated. In this paper we extend the study of selective search into new areas using a fine-grained simulation, examining the difference in efficiency when term-based and sample-based resource selection algorithms are used; measuring the effect of two policies for assigning index shards to machines; and exploring the benefits of index-spreading and mirroring as the number of deployed machines is varied. Results obtained for two large datasets and four large query logs confirm that selective search is significantly more efficient than conventional distributed search architectures and can handle higher query rates. Furthermore, we demonstrate that selective search can be tuned to avoid bottlenecks, and thus maximize usage of the underlying computer hardware. 相似文献

13.

Beyond entities: promoting explorative search with bundles

Ilaria Bordino Mounia Lalmas Yelena Mejova Olivier Van Laere 《Information Retrieval》2016,19(5):447-486

Search engines are increasingly going beyond the pure relevance of search results to entertain users with information items that are interesting and even surprising, albeit sometimes not fully related to their search intent. In this paper, we study this serendipitous search space in the context of entity search, which has recently emerged as a powerful paradigm for building semantically rich answers. Specifically, our work proposes to enhance an explorative search system that represents a large sample of Yahoo Answers as an entity network, with a result structuring that goes beyond ranked lists, using composite entity retrieval, which requires a bundling of the results. We propose and compare six bundling methods, which exploit topical categories, entity specializations, and sentiment, and go beyond simple entity clustering. Two large-scale crowd-sourced studies show that users find a bundled organization—especially based on the topical categories of the query entity—to be better at revealing the most useful results, as well as at organizing the results, helping to discover novel and interesting information, and promoting exploration. Finally, a third study of 30 simulated search tasks reveals the bundled search experience to be less frustrating and more rewarding, with more users willing to recommend it to others. 相似文献

14.

Clinical and academic use of electronic and print books: the Health Sciences Library System e-book study at the University of Pittsburgh

Barbara L Folb Charles B Wessel Leslie J Czechowski 《Journal of the Medical Library Association》2011,99(3):218-228

相似文献

15.

Archival appraisal in Brazil

Lara Mancuso 《档案与原稿》2013,41(2):146-159

相似文献

16.

Neural information retrieval: at the end of the early years

Kezban Dilek Onal Ye Zhang Ismail Sengor Altingovde Md Mustafizur Rahman Pinar Karagoz Alex Braylan Brandon Dang Heng-Lu Chang Henna Kim Quinten McNamara Aaron Angert Edward Banner Vivek Khetan Tyler McDonnell An Thanh Nguyen Dan Xu Byron C. Wallace Maarten de Rijke Matthew Lease 《Information Retrieval》2018,21(2-3):111-182

A recent “third wave” of neural network (NN) approaches now delivers state-of-the-art performance in many machine learning tasks, spanning speech recognition, computer vision, and natural language processing. Because these modern NNs often comprise multiple interconnected layers, work in this area is often referred to as deep learning. Recent years have witnessed an explosive growth of research into NN-based approaches to information retrieval (IR). A significant body of work has now been created. In this paper, we survey the current landscape of Neural IR research, paying special attention to the use of learned distributed representations of textual units. We highlight the successes of neural IR thus far, catalog obstacles to its wider adoption, and suggest potentially promising directions for future research. 相似文献

17.

Be/longing in the archival body: eros and the “Endearing” value of material lives

Jamie A. Lee 《Archival Science》2016,16(1):33-51

This paper explores the nature of the archival body and the ways in which it is temporally situated and yet also always in motion. Applying transdisciplinary logics, it argues that the affective nature of archival productions follows the machinations of metamorphoses and (un)becoming. Using two queer/ed and transgender archives as sites of inquiry, the paper explores the erotic and affective nature of accessing the archival body in its multimodal forms. Although touching, smelling and stroking what remains of distinct material lives might elucidate arousal and certain other affective and haptic responses within the visitor to the archives, the records themselves hold and cradle their creators and their storytelling techniques along with their relationships to longing for and belonging in the archival body of knowledge. This approach suggests that understanding of the record and its affects can be enriched by temporal perspectives that acknowledge distinct and diverse temporalities and promote generative understandings of potentially meaningful progressions of time and everyday rhythms embodied within archival materials. 相似文献

18.

Linear feature extraction for ranking

Gaurav Pandey Zhaochun Ren Shuaiqiang Wang Jari Veijalainen Maarten de Rijke 《Information Retrieval》2018,21(6):481-506

We address the feature extraction problem for document ranking in information retrieval. We then propose LifeRank, a Linear feature extraction algorithm for Ranking. In LifeRank, we regard each document collection for ranking as a matrix, referred to as the original matrix. We try to optimize a transformation matrix, so that a new matrix (dataset) can be generated as the product of the original matrix and a transformation matrix. The transformation matrix projects high-dimensional document vectors into lower dimensions. Theoretically, there could be very large transformation matrices, each leading to a new generated matrix. In LifeRank, we produce a transformation matrix so that the generated new matrix can match the learning to rank problem. Extensive experiments on benchmark datasets show the performance gains of LifeRank in comparison with state-of-the-art feature selection algorithms. 相似文献

19.

MEMORY and MEMORIES in lexical environment: Bibliometric analysis

I. V. Marshakova-Shaikevich 《Scientific and Technical Information Processing》2007,34(1):27-39

This study is devoted to detection of the lexical environment and demonstration of the thematic medium of the words MEMORY and MEMORIES in the social sciences on the basis of the bibliographic database Social Science Citation Index (SSCI) of the Institute for Scientific Information (USA). The amount of studied material is over 3000 documents in English. Corresponding corpora and subcorpora of summary texts are formed, general frequency dictionaries and frequency dictionaries of binary combinations for each corpus and subcorpus are constructed, words and combinations specific for each subcorpus are found, and corresponding factors (lexical markers) are calculated for them. The general statistical information on the usage of the words under study is given, the obtained results of lexical analysis are represented in a tabulated form, and the corresponding semantic maps are discussed. 相似文献

20.

Definitions of Electronic Records, the European Perspective

Maria Guercio 《Archives and Museum Informatics》1997,11(3-4):219-222

For the definition of electronic records, the use of new terms, like literary warrant, is not necessary, and for the European perspective even not understandable. If this expression simply means best practice and professional culture in recordkeeping, we only to know what creators did for centuries and still do today and probably will do also in the future, by referring to the archival science, diplomatics and archival practice for clarifying definitions in the recordkeeping environment. A multi-disciplinary approach is still required for the electronic recordkeeping system as it was in the past for traditional records, but the theory and the terminology should be consistent and based on the deep understanding of essential characteristics of records and essential requirements of good recordkeeping to produce in the first place and maintain reliable and authentic records. Of course, a record is more than recorded information created in the course of business activity: a record is the recorded representation of an act produced in a specific form – the form prescribed by the legal system – by a creator in the course of its activity. 相似文献