首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 57 毫秒
1.
The goal of this article is to understand the reasons why known-item search queries entered in a discovery system return zero hits. We analyze a sample of 708 known-item queries and classify them into four categories of zero hits with regard to whether the item is held by the library and whether the query is formulated correctly: (1) item in stock, but query incorrect, (2) item not in stock, (3) item in stock, but incomplete or erroneous metadata, (4) query is ambiguous or not understandable. The main reasons for zero hits are caused by acquisition and erroneous search queries. We discuss possible solutions for known-item queries resulting in zero hits from the side of the system and show that 30% of zero hits could easily be avoided by applying automatic spelling correction. We argue that libraries can improve their discovery systems or online catalogs by applying strategies to avoid or cope with zero hits inspired by web search engines and commercial search web sites.  相似文献   

2.
Document clustering offers the potential of supporting users in interactive retrieval, especially when users have problems in specifying their information need precisely. In this paper, we present a theoretic foundation for optimum document clustering. Key idea is to base cluster analysis and evalutation on a set of queries, by defining documents as being similar if they are relevant to the same queries. Three components are essential within our optimum clustering framework, OCF: (1) a set of queries, (2) a probabilistic retrieval method, and (3) a document similarity metric. After introducing an appropriate validity measure, we define optimum clustering with respect to the estimates of the relevance probability for the query-document pairs under consideration. Moreover, we show that well-known clustering methods are implicitly based on the three components, but that they use heuristic design decisions for some of them. We argue that with our framework more targeted research for developing better document clustering methods becomes possible. Experimental results demonstrate the potential of our considerations.  相似文献   

3.
A better understanding of users' search interactions in library search systems is key to improving the result ranking. By focusing on known-item searches (searches for an item already known) and search tactics, vast improvement can be made. To better understand user behaviour, we conducted four transaction-log studies, comprising more than 4.2 million search sessions from two German library search systems. Results show that most sessions are rather short; users tend to issue short queries and usually do not go beyond the first search engine result page (SERP). The most frequently used search tactic was the extension of a query (‘Exhaust’). Looking at the known-item searches, it becomes clear that this query type is of great importance. Between 38%–57% of all queries are known-item queries. Titles or title parts were the most frequent elements of these queries, either alone or in combination with the author's name. Unsuccessful known-item searches were often caused by items not available in the system. Results can be applied by libraries and library system vendors to improve their systems, as well as when designing new systems. Future research, in addition to log data, should also include background information on the usage, for example, through user surveys.  相似文献   

4.
从Sogou查询日志中选取样本查询且进行人工标注,通过对标注后新闻查询的分析,提出能用于识别新闻意图的新特征,即查询表达式特征、查询随时间分布特征以及点击结果特征。根据这3个特征,利用决策树分类器实现查询中新闻意图的自动识别,结果发现:①新闻类查询的查询目标主要集中在特定主题信息以及娱乐类信息方面,其查询主题大多为娱乐、政治、体育与经济类信息;②相对非新闻查询,新闻查询具有更可能包含实体、随时间分布波动较大、点击结果之间相似度更高的特点;③本方法对查询中新闻意图的识别效果较好,其宏平均准确率、召回率、F值分别为 0.76、0.73、0、74。  相似文献   

5.
Users often issue all kinds of queries to look for the same target due to the intrinsic ambiguity and flexibility of natural languages. Some previous work clusters queries based on co-clicks; however, the intents of queries in one cluster are not that similar but roughly related. It is desirable to conduct automatic mining of queries with equivalent intents from a large scale search logs. In this paper, we take account of similarities between query strings. There are two issues associated with such similarities: it is too costly to compare any pair of queries in large scale search logs, and two queries with a similar formulation, such as “SVN” (Apache Subversion) and support vector machine (SVM), are not necessarily similar in their intents. To address these issues, we propose using the similarities of query strings above the co-click based clustering results. Our method improves precision over the co-click based clustering method (lifting precision from 0.37 to 0.62), and outperforms a commercial search engine’s query alteration (lifting \(F_1\) measure from 0.42 to 0.56). As an application, we consider web document retrieval. We aggregate similar queries’ click-throughs with the query’s click-throughs and evaluate them on a large scale dataset. Experimental results indicate that our proposed method significantly outperforms the baseline method of using a query’s own click-throughs in all metrics.  相似文献   

6.
Selective exposure has been studied for more than half a century, but little research has systematically analyzed the implications of various methodological choices inherent in these designs. We examine how four choices affect results in studies of selectivity in political contexts: including an entertainment option, including or excluding moderates, post-hoc adjustment of the subjects through a question about likelihood of selecting content in the real world, and assessing selectivity on the basis of issue attitudes or political ideology. Relying on a large experimental survey (N?=?2,300), we compare the effects of these choices on two results: probability of selective exposure to like-minded political news and predictors of selective exposure (attitude strength, political interest, knowledge, and participation). Our findings show that probability estimates and, to a lesser extent, predictors of selective exposure are sensitive to methodological choices. These findings provide guidance about how methodological choices may affect researchers’ assessments and conclusions.  相似文献   

7.
A primary argument for the widespread production of media violence is that audiences want to watch violent content. This assumption is examined in this meta-analytic review of existing research on both selective exposure to and enjoyment of violence. The results show that violence has a significant effect on both selective exposure and enjoyment, but in different directions. Specifically, violence increases selective exposure but decreases enjoyment of content. Potential explanations for these effects and moderators that could influence the results (e.g., sex, aggressive personality traits, type of content) are considered, and the practical implications of these findings are discussed.  相似文献   

8.
The retrieval of sentences that are relevant to a given information need is a challenging passage retrieval task. In this context, the well-known vocabulary mismatch problem arises severely because of the fine granularity of the task. Short queries, which are usually the rule rather than the exception, aggravate the problem. Consequently, effective sentence retrieval methods tend to apply some form of query expansion, usually based on pseudo-relevance feedback. Nevertheless, there are no extensive studies comparing different statistical expansion strategies for sentence retrieval. In this work we study thoroughly the effect of distinct statistical expansion methods on sentence retrieval. We start from a set of retrieved documents in which relevant sentences have to be found. In our experiments different term selection strategies are evaluated and we provide empirical evidence to show that expansion before sentence retrieval yields competitive performance. This is particularly novel because expansion for sentence retrieval is often done after sentence retrieval (i.e. expansion terms are mined from a ranked set of sentences) and there are no comparative results available between both types of expansion. Furthermore, this comparison is particularly valuable because there are important implications in time efficiency. We also carefully analyze expansion on weak and strong queries and demonstrate clearly that expanding queries before sentence retrieval is not only more convenient for efficiency purposes, but also more effective when handling poor queries.  相似文献   

9.
This paper addresses the problem of estimating the size of a deep web data source that is accessible by queries only. Since most deep web data sources are non-cooperative, a data source size can only be estimated by sending queries and analyzing the returning results. We propose an efficient estimator based on the capture–recapture method. First we derive an equation between the overlapping rate and the percentage of the data examined when random samples are retrieved from a uniform distribution. This equation is conceptually simple and leads to the derivation of an estimator for samples obtained by random queries. Since random queries do not produce random documents, it is well known that the traditional methods by random queries underestimate the size, i.e., those estimators have negative bias. Based on the simple estimator for random samples, we adjust the equation so that it can handle the samples returned by random queries. We conduct both simulation studies and experiments on corpora including Gov2, Reuters, Newsgroups, and Wikipedia. The results show that our method has small bias and standard deviation.  相似文献   

10.
Word form normalization through lemmatization or stemming is a standard procedure in information retrieval because morphological variation needs to be accounted for and several languages are morphologically non-trivial. Lemmatization is effective but often requires expensive resources. Stemming is also effective in most contexts, generally almost as good as lemmatization and typically much less expensive; besides it also has a query expansion effect. However, in both approaches the idea is to turn many inflectional word forms to a single lemma or stem both in the database index and in queries. This means extra effort in creating database indexes. In this paper we take an opposite approach: we leave the database index un-normalized and enrich the queries to cover for surface form variation of keywords. A potential penalty of the approach would be long queries and slow processing. However, we show that it only matters to cover a negligible number of possible surface forms even in morphologically complex languages to arrive at a performance that is almost as good as that delivered by stemming or lemmatization. Moreover, we show that, at least for typical test collections, it only matters to cover nouns and adjectives in queries. Furthermore, we show that our findings are particularly good for short queries that resemble normal searches of web users. Our approach is called FCG (for Frequent Case (form) Generation). It can be relatively easily implemented for Latin/Greek/Cyrillic alphabet languages by examining their (typically very skewed) nominal form statistics in a small text sample and by creating surface form generators for the 3–9 most frequent forms. We demonstrate the potential of our FCG approach for several languages of varying morphological complexity: Swedish, German, Russian, and Finnish in well-known test collections. Applications include in particular Web IR in languages poor in morphological resources.  相似文献   

11.
12.
The hospital librarians in Rochester, New York and a research team developed and administered a questionnaire to measure the impact of information provided by the librarian on physicians' clinical decision making. While the research was underway, the librarians also developed a publicity plan. The goal of the plan was to create awareness of the study results in the local client population, as well as in the health care community at large. The plan served to describe and put in priority order the types of media that the librarians would use to publicize the study to target groups. This article includes examples of a nationwide and an institution-specific publicity plan. those developing publicity plans for future library research may want to allocate adequate funds to hire a media consultant to increase their prospects for national exposure.  相似文献   

13.
We propose a method for search privacy on the Internet, focusing on enhancing plausible deniability against search engine query-logs. The method approximates the target search results, without submitting the intended query and avoiding other exposing queries, by employing sets of queries representing more general concepts. We model the problem theoretically, and investigate the practical feasibility and effectiveness of the proposed solution with a set of real queries with privacy issues on a large web collection. The findings may have implications for other IR research areas, such as query expansion and fusion in meta-search. Finally, we discuss ideas for privacy, such as k-anonymity, and how these may be applied to search tasks.  相似文献   

14.
This paper presents a Graph Inference retrieval model that integrates structured knowledge resources, statistical information retrieval methods and inference in a unified framework. Key components of the model are a graph-based representation of the corpus and retrieval driven by an inference mechanism achieved as a traversal over the graph. The model is proposed to tackle the semantic gap problem—the mismatch between the raw data and the way a human being interprets it. We break down the semantic gap problem into five core issues, each requiring a specific type of inference in order to be overcome. Our model and evaluation is applied to the medical domain because search within this domain is particularly challenging and, as we show, often requires inference. In addition, this domain features both structured knowledge resources as well as unstructured text. Our evaluation shows that inference can be effective, retrieving many new relevant documents that are not retrieved by state-of-the-art information retrieval models. We show that many retrieved documents were not pooled by keyword-based search methods, prompting us to perform additional relevance assessment on these new documents. A third of the newly retrieved documents judged were found to be relevant. Our analysis provides a thorough understanding of when and how to apply inference for retrieval, including a categorisation of queries according to the effect of inference. The inference mechanism promoted recall by retrieving new relevant documents not found by previous keyword-based approaches. In addition, it promoted precision by an effective reranking of documents. When inference is used, performance gains can generally be expected on hard queries. However, inference should not be applied universally: for easy, unambiguous queries and queries with few relevant documents, inference did adversely affect effectiveness. These conclusions reflect the fact that for retrieval as inference to be effective, a careful balancing act is involved. Finally, although the Graph Inference model is developed and applied to medical search, it is a general retrieval model applicable to other areas such as web search, where an emerging research trend is to utilise structured knowledge resources for more effective semantic search.  相似文献   

15.
The hospital librarians in Rochester, New York and a research team developed and administered a questionnaire to measure the impact of information provided by the librarian on physicians' clinical decision making. While the research was underway, the librarians also developed a publicity plan. The goal of the plan was to create awareness of the study results in the local client population, as well as in the health care community at large. The plan served to describe and put in priority order the types of media that the librarians would use to publicize the study to target groups. This article includes examples of a nationwide and an institution-specific publicity plan. Those developing publicity plans for future library research may want to allocate adequate funds to hire a media consultant to increase their prospects for national exposure.  相似文献   

16.
Examining the impact of various media sources on knowledge has a long tradition in political communication. Although much of the extant research focuses on the impact of traditional media on factual knowledge, research is expanding to include a variety of media sources and multiple dimensions of knowledge, in addition to understanding processes that better explain these relationships. Using a nationwide, opt-in online survey (n = 993), we examine the relationship between partisan media and structural knowledge, which assess how interconnected people see political concepts. Utilizing understanding of the Affordable Care Act as the content area of interest, we examine whether exposure to partisan media has differential effects on attitudinal ambivalence—holding both positive and negative attitudes toward an object—based on the political ideology of the respondent, and whether this impact of ambivalence influenced structural knowledge. Our results show that exposure to attitude-consistent media decreased attitudinal ambivalence. This exposure to attitude-consistent media results in a positive indirect effect on structural knowledge through this decrease in ambivalence. We find the reverse effect for use of attitude-inconsistent media.  相似文献   

17.
Search engine results are often biased towards a certain aspect of a query or towards a certain meaning for ambiguous query terms. Diversification of search results offers a way to supply the user with a better balanced result set increasing the probability that a user finds at least one document suiting her information need. In this paper, we present a reranking approach based on minimizing variance of Web search results to improve topic coverage in the top-k results. We investigate two different document representations as the basis for reranking. Smoothed language models and topic models derived by Latent Dirichlet?allocation. To evaluate our approach we selected 240 queries from Wikipedia disambiguation pages. This provides us with ambiguous queries together with a community generated balanced representation of their (sub)topics. For these queries we crawled two major commercial search engines. In addition, we present a new evaluation strategy based on Kullback-Leibler divergence and Wikipedia. We evaluate this method using the TREC sub-topic evaluation on the one hand, and manually annotated query results on the other hand. Our results show that minimizing variance in search results by reranking relevant pages significantly improves topic coverage in the top-k results with respect to Wikipedia, and gives a good overview of the overall search result. Moreover, latent topic models achieve competitive diversification with significantly less reranking. Finally, our evaluation reveals that our automatic evaluation strategy using Kullback-Leibler divergence correlates well with α-nDCG scores used in manual evaluation efforts.  相似文献   

18.
This study explores the associative influence of pretrial publicity on (a) an individual’s perceptions of a criminal offender as malicious and (b) an individual’s judgment of the criminal offender’s punishment for the crime. Expanding on the uses of agenda-setting and framing theories, this research indicates that attention to pretrial news media about a specific criminal is significantly associated with negative judgments of a criminal offender. The criminal offender in the Colorado theater shooting case, James Holmes, was the criminal offender used in this study. Results from a national survey (= 236) indicated that pretrial publicity significantly influenced participants’ perceptions of James Holmes as malicious and resulted in increased retributivist support (i.e., views of the offender deserving a harsher punishment). Mediation scenarios were also detected, whereas pretrial publicity exposure mediated the relationship between case interest and judgments of James Holmes as malicious and deserving retributivist punishment.  相似文献   

19.
When searching for health information, results quality can be judged against available scientific evidence: Do search engines return advice consistent with evidence based medicine? We compared the performance of domain-specific health and depression search engines against a general-purpose engine (Google) on both relevance of results and quality of advice. Over 101 queries, to which the term ‘depression’ was added if not already present, Google returned more relevant results than those of the domain-specific engines. However, over the 50 treatment-related queries, Google returned 70 pages recommending for or against a well studied treatment, of which 19 strongly disagreed with the scientific evidence. A domain-specific index of 4 sites selected by domain experts was only wrong in 5 of 50 recommendations. Analysis suggests a tension between relevance and quality. Indexing more pages can give a greater number of relevant results, but selective inclusion can give better quality.  相似文献   

20.
This paper reports findings from an analysis of medical or health queries to different web search engines. We report results: (i). comparing samples of 10000 web queries taken randomly from 1.2 million query logs from the AlltheWeb.com and Excite.com commercial web search engines in 2001 for medical or health queries, (ii). comparing the 2001 findings from Excite and AlltheWeb.com users with results from a previous analysis of medical and health related queries from the Excite Web search engine for 1997 and 1999, and (iii). medical or health advice-seeking queries beginning with the word 'should'. Findings suggest: (i). a small percentage of web queries are medical or health related, (ii). the top five categories of medical or health queries were: general health, weight issues, reproductive health and puberty, pregnancy/obstetrics, and human relationships, and (iii). over time, the medical and health queries may have declined as a proportion of all web queries, as the use of specialized medical/health websites and e-commerce-related queries has increased. Findings provide insights into medical and health-related web querying and suggests some implications for the use of the general web search engines when seeking medical/health information.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号