首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Recreational queries from users searching for places to go and things to do or see are very common in web and mobile search. Users specify constraints for what they are looking for, like suitability for kids, romantic ambiance or budget. Queries like “restaurants in New York City” are currently served by static local results or the thumbnail carousel. More complex queries like “things to do in San Francisco with kids” or “romantic places to eat in Seattle” require the user to click on every element of the search engine result page to read articles from Yelp, TripAdvisor, or WikiTravel to satisfy their needs. Location data, which is an essential part of web search, is even more prevalent with location-based social networks and offers new opportunities for many ways of satisfying information seeking scenarios.In this paper, we address the problem of recreational queries in information retrieval and propose a solution that combines search query logs with LBSNs data to match user needs and possible options. At the core of our solution is a framework that combines social, geographical, and temporal information for a relevance model centered around the use of semantic annotations on Points of Interest with the goal of addressing these recreational queries. A central part of the framework is a taxonomy derived from behavioral data that drives the modeling and user experience. We also describe in detail the complexity of assessing and evaluating Point of Interest data, a topic that is usually not covered in related work, and propose task design alternatives that work well.We demonstrate the feasibility and scalability of our methods using a data set of 1B check-ins and a large sample of queries from the real-world. Finally, we describe the integration of our techniques in a commercial search engine.  相似文献   

2.
Web queries in question format are becoming a common element of a user's interaction with Web search engines. Web search services such as Ask Jeeves – a publicly accessible question and answer (Q&A) search engine – request users to enter question format queries. This paper provides results from a study examining queries in question format submitted to two different Web search engines – Ask Jeeves that explicitly encourages queries in question format and the Excite search service that does not explicitly encourage queries in question format. We identify the characteristics of queries in question format in two different data sets: (1) 30,000 Ask Jeeves queries and 15,575 Excite queries, including the nature, length, and structure of queries in question format. Findings include: (1) 50% of Ask Jeeves queries and less than 1% of Excite were in question format, (2) most users entered only one query in question format with little query reformulation, (3) limited range of formats for queries in question format – mainly “where”, “what”, or “how” questions, (4) most common question query format was “Where can I find………” for general information on a topic, and (5) non-question queries may be in request format. Overall, four types of user Web queries were identified: keyword, Boolean, question, and request. These findings provide an initial mapping of the structure and content of queries in question and request format. Implications for Web search services are discussed.  相似文献   

3.
Query suggestion is generally an integrated part of web search engines. In this study, we first redefine and reduce the query suggestion problem as “comparison of queries”. We then propose a general modular framework for query suggestion algorithm development. We also develop new query suggestion algorithms which are used in our proposed framework, exploiting query, session and user features. As a case study, we use query logs of a real educational search engine that targets K-12 students in Turkey. We also exploit educational features (course, grade) in our query suggestion algorithms. We test our framework and algorithms over a set of queries by an experiment and demonstrate a 66–90% statistically significant increase in relevance of query suggestions compared to a baseline method.  相似文献   

4.
This paper presents a new approach to query expansion in search engines through the use of general non-topical terms (NTTs) and domain-specific semi-topical terms (STTs). NTTs and STTs can be used in conjunction with topical terms (TTs) to improve precision in retrieval results. In Phase I, 20 topical queries in two domains (Health and the Social Sciences) were carried out in Google and from the results of the queries, 800 pages were textually analysed. Of 1442 NTTs and STTs identified, 15% were shared between the two domains; 62% were NTTs and 38% were STTs; and approximately 64% occurred before while 36% occurred after their respective topical terms (TTs). Findings of Phase II showed that query expansion through NTTs (or STTs) particularly in the ‘exact title’ and URL search options resulted in more precise and manageable results. Statistically significant differences were found between Health and the Social Sciences vis-à-vis keyword and ‘exact phrase’ search results; however there were no significant differences in exact title and URL search results. The ratio of exact phrase, exact title, and URL search result frequencies to keyword search result frequencies also showed statistically significant differences between the two domains. Our findings suggest that web searching could be greatly enhanced combining NTTs (and STTs) with TTs in an initial query. Additionally, search results would improve if queries are restricted to the exact title or URL search options. Finally, we suggest the development and implementation of knowledge-based lists of NTTs (and STTs) by both general and specialized search engines to aid query expansion.  相似文献   

5.
This paper focuses on temporal retrieval of activities in videos via sentence queries. Given a sentence query describing an activity, temporal moment retrieval aims at localizing the temporal segment within the video that best describes the textual query. This is a general yet challenging task as it requires the comprehending of both video and language. Existing research predominantly employ coarse frame-level features as the visual representation, obfuscating the specific details (e.g., the desired objects “girl”, “cup” and action “pour”) within the video which may provide critical cues for localizing the desired moment. In this paper, we propose a novel Spatial and Language-Temporal Tensor Fusion (SLTF) approach to resolve those issues. Specifically, the SLTF method first takes advantage of object-level local features and attends to the most relevant local features (e.g., the local features “girl”, “cup”) by spatial attention. Then we encode the sequence of the local features on consecutive frames by employing LSTM network, which can capture the motion information and interactions among these objects (e.g., the interaction “pour” involving these two objects). Meanwhile, language-temporal attention is utilized to emphasize the keywords based on moment context information. Thereafter, a tensor fusion network learns both the intra-modality and inter-modality dynamics, which can enhance the learning of moment-query representation. Therefore, our proposed two attention sub-networks can adaptively recognize the most relevant objects and interactions in the video, and simultaneously highlight the keywords in the query for retrieving the desired moment. Experimental results on three public benchmark datasets (obtained from TACOS, Charades-STA, and DiDeMo) show that the SLTF model significantly outperforms current state-of-the-art approaches, and demonstrate the benefits produced by new technologies incorporated into SLTF.  相似文献   

6.
In this paper the features of a microprocessor based architecture for bibliographic retrieval system are illustrated. The proposed system consists of the following three functional blocks: the “query processor”, the “simple query executers” and the “answer composer”. The query processor parses the queries and breaks the complex query into simple queries. Each simple query executer is able to perform the operations satisfying a simple query. Finally, the answer composer puts together the results of all simple query executers and produces the response to the query originally raised. This machine will allow the implementation of a very powerfull query language. The basic design goals are the system modularity and a whatever complex query's fulfilment. This is achieved through the proposed query language and by means of the system architecture allowing high parallelism in the performed operations.  相似文献   

7.
Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.  相似文献   

8.
9.
With the increasing popularity and social influence of search engines in IR, various studies have raised concerns on the presence of bias in search engines and the social responsibilities of IR systems. As an essential component of search engine, ranking is a crucial mechanism in presenting the search results or recommending items in a fair fashion. In this article, we focus on the top-k diversity fairness ranking in terms of statistical parity fairness and disparate impact fairness. The former fairness definition provides a balanced overview of search results where the number of documents from different groups are equal; The latter enables a realistic overview where the proportion of documents from different groups reflect the overall proportion. Using 100 queries and top 100 results per query from Google as the data, we first demonstrate how topical diversity bias is present in the top web search results. Then, with our proposed entropy-based metrics for measuring the degree of bias, we reveal that the top search results are unbalanced and disproportionate to their overall diversity distribution. We explore several fairness ranking strategies to investigate the relationship between fairness, diversity, novelty and relevance. Our experimental results show that using a variant of fair ε-greedy strategy, we could bring more fairness and enhance diversity in search results without a cost of relevance. In fact, we can improve the relevance and diversity by introducing the diversity fairness. Additional experiments with TREC datasets containing 50 queries demonstrate the robustness of our proposed strategies and our findings on the impact of fairness. We present a series of correlation analysis on the amount of fairness and diversity, showing that statistical parity fairness highly correlates with diversity while disparate impact fairness does not. This provides clear and tangible implications for future works where one would want to balance fairness, diversity and relevance in search results.  相似文献   

10.
Real time search is an increasingly important area of information seeking on the Web. In this research, we analyze 1,005,296 user interactions with a real time search engine over a 190 day period. Using query log analysis, we investigate searching behavior, categorize search topics, and measure the economic value of this real time search stream. We examine aggregate usage of the search engine, including number of users, queries, and terms. We then classify queries into subject categories using the Google Directory topical hierarchy. We next estimate the economic value of the real time search traffic using the Google AdWords keyword advertising platform. Results shows that 30% of the queries were unique (used only once in the entire dataset), which is low compared to traditional Web searching. Also, 60% of the search traffic comes from the search engine’s application program interface, indicating that real time search is heavily leveraged by other applications. There are many repeated queries over time via these application program interfaces, perhaps indicating both long term interest in a topic and the polling nature of real time queries. Concerning search topics, the most used terms dealt with technology, entertainment, and politics, reflecting both the temporal nature of the queries and, perhaps, an early adopter user-based. However, 36% of the queries indicate some geographical affinity, pointing to a location-based aspect to real time search. In terms of economic value, we calculate this real time search stream to be worth approximately US $33,000,000 (US $33 M) on the online advertising market at the time of the study. We discuss the implications for search engines and content providers as real time content increasingly enters the main stream as an information source.  相似文献   

11.
Query auto completion (QAC) models recommend possible queries to web search users when they start typing a query prefix. Most of today’s QAC models rank candidate queries by popularity (i.e., frequency), and in doing so they tend to follow a strict query matching policy when counting the queries. That is, they ignore the contributions from so-called homologous queries, queries with the same terms but ordered differently or queries that expand the original query. Importantly, homologous queries often express a remarkably similar search intent. Moreover, today’s QAC approaches often ignore semantically related terms. We argue that users are prone to combine semantically related terms when generating queries.We propose a learning to rank-based QAC approach, where, for the first time, features derived from homologous queries and semantically related terms are introduced. In particular, we consider: (i) the observed and predicted popularity of homologous queries for a query candidate; and (ii) the semantic relatedness of pairs of terms inside a query and pairs of queries inside a session. We quantify the improvement of the proposed new features using two large-scale real-world query logs and show that the mean reciprocal rank and the success rate can be improved by up to 9% over state-of-the-art QAC models.  相似文献   

12.
Caching search results is employed in information retrieval systems to expedite query processing and reduce back-end server workload. Motivated by the observation that queries belonging to different topics have different temporal-locality patterns, we investigate a novel caching model called STD (Static-Topic-Dynamic cache), a refinement of the traditional SDC (Static-Dynamic Cache) that stores in a static cache the results of popular queries and manages the dynamic cache with a replacement policy for intercepting the temporal variations in the query stream.Our proposed caching scheme includes another layer for topic-based caching, where the entries are allocated to different topics (e.g., weather, education). The results of queries characterized by a topic are kept in the fraction of the cache dedicated to it. This permits to adapt the cache-space utilization to the temporal locality of the various topics and reduces cache misses due to those queries that are neither sufficiently popular to be in the static portion nor requested within short-time intervals to be in the dynamic portion.We simulate different configurations for STD using two real-world query streams. Experiments demonstrate that our approach outperforms SDC with an increase up to 3% in terms of hit rates, and up to 36% of gap reduction w.r.t. SDC from the theoretical optimal caching algorithm.  相似文献   

13.
This paper is concerned with techniques for fuzzy query processing in a database system. By a fuzzy query we mean a query which uses imprecise or fuzzy predicates (e.g. AGE = “VERY YOUNG”, SALARY = “MORE OR LESS HIGH”, YEAR-OF-EMPLOYMENT = “RECENT”, SALARY ? 20,000, etc.). As a basis for fuzzy query processing, a fuzzy retrieval system based on the theory of fuzzy sets and linguistic variables is introduced. In our system model, the first step in processing fuzzy queries consists of assigning meaning to fuzzy terms (linguistic values), of a term-set, used for the formulation of a query. The meaning of a fuzzy term is defined as a fuzzy set in a universe of discourse which contains the numerical values of a domain of a relation in the system database.The fuzzy retrieval system developed is a high level model for the techniques which may be used in a database system. The feasibility of implementing such techniques in a real environment is studied. Specifically, within this context, techniques for processing simple fuzzy queries expressed in the relational query language SEQUEL are introduced.  相似文献   

14.
Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.  相似文献   

15.
Students use general web search engines as their primary source of research while trying to find answers to school-related questions. Although search engines are highly relevant for the general population, they may return results that are out of educational context. Another rising trend; social community question answering websites are the second choice for students who try to get answers from other peers online. We attempt discovering possible improvements in educational search by leveraging both of these information sources. For this purpose, we first implement a classifier for educational questions. This classifier is built by an ensemble method that employs several regular learning algorithms and retrieval based approaches that utilize external resources. We also build a query expander to facilitate classification. We further improve the classification using search engine results and obtain 83.5% accuracy. Although our work is entirely based on the Turkish language, the features could easily be mapped to other languages as well. In order to find out whether search engine ranking can be improved in the education domain using the classification model, we collect and label a set of query results retrieved from a general web search engine. We propose five ad-hoc methods to improve search ranking based on the idea that the query-document category relation is an indicator of relevance. We evaluate these methods for overall performance, varying query length and based on factoid and non-factoid queries. We show that some of the methods significantly improve the rankings in the education domain.  相似文献   

16.
Most of the Peer-to-Peer search techniques proposed in the recent years have focused on the single-key retrieval. However, similarity search in metric spaces represents an important paradigm for content-based retrieval in many applications. In this paper we introduce an extension of the well-known Content-Addressable Network paradigm to support storage and retrieval of more generic metric space objects. In particular we address the problem of executing the nearest neighbors queries, and propose three different algorithms of query execution. An extensive experimental study on real-life data sets explores the performance characteristics of the proposed algorithms by showing their advantages and disadvantages.  相似文献   

17.
Most of the peer-to-peer search techniques proposed in the recent years have focused on the single-key retrieval. However, similarity search in metric spaces represents an important paradigm for content-based retrieval in many applications. In this paper we introduce an extension of the well-known Content-Addressable Network paradigm to support storage and retrieval of more generic metric space objects. In particular we address the problem of executing the nearest neighbors queries, and propose three different algorithms of query propagation. An extensive experimental study on real-life data sets explores the performance characteristics of the proposed algorithms by showing their advantages and disadvantages.  相似文献   

18.
A comparative evaluation has been carried out on the Philips “DIRECT” and the British “INSPEC” retrieval system. DIRECT is based on automatic indexing whereas INSPEC uses manual subject indexing.Two queries were submitted to both systems, using the same data base. The results are expressed in terms of recall and precision. Both recall and precision of INSPEC were found to be higher than those of DIRECT by 20%. It is concluded that this is mainly a result of the query formulation. The effectiveness obtained with automatic indexing of documents is equivalent to that of the manual procedure.  相似文献   

19.
We present a novel multimodal query expansion strategy, based on genetic programming (GP), for image search in visually-oriented e-commerce applications. Our GP-based approach aims at both: learning to expand queries with multimodal information and learning to compute the “best” ranking for the expanded queries. However, different from previous work, the query is only expressed in terms of the visual content, which brings several challenges for this type of application. In order to evaluate the effectiveness of our method, we have collected two datasets containing images of clothing products taken from different online shops. Experimental results indicate that our method is an effective alternative for improving the quality of image search results when compared to a genetic programming system based only on visual information. Our method can achieve gains varying from 10.8% against the strongest learning-to-rank baseline to 54% against an adhoc specialized solution for the particular domain at hand.  相似文献   

20.
The current study addresses the problem of retrieving a specific moment from an untrimmed video by a sentence query. Existing methods have achieved high performance by designing various structures to match visual-text relations. Yet, these methods tend to return an interval starting from 0s, which we named “0s bias”. In this paper, we propose a Circular Co-Teaching (CCT) mechanism using a captioner to improve an existing retrieval model (localizer) from two aspects: biased annotations and easy samples. Correspondingly, CCT contains two processes: (1) Pseudo Query Generation (captioner to localizer), aiming at transferring the knowledge from generated queries to the localizer to balance annotations; (2) Competence-based Curriculum Learning (localizer to captioner), training the captioner in an easy-to-hard fashion guided by localization results, making pairs of the false-positive moment and pseudo query become easy samples for the localizer. Extensive experiments show that our CCT can alleviate “0s bias” with even 4% improvement for existing approaches on average in two public datasets (ActivityNet-Captions, and Charades-STA), in terms of R@1,IoU=0.7. Notably, our method also outperforms baselines in an out-of-distribution scenario. We also quantitatively validate CCT’s ability to cope with “0s bias” by a proposed metric, DM. Our study not only theoretically contributes to detecting “0s bias”, but also provides a highly effective tool for video moment retrieval by alleviating such bias.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号