首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
随着网络的飞速发展,网页数量急剧膨胀,近几年来更是以指数级进行增长,搜索引擎面临的挑战越来越严峻,很难从海量的网页中准确快捷地找到符合用户需求的网页。网页分类是解决这个问题的有效手段之一,基于网页主题分类和基于网页体裁分类是网页分类的两大主流,二者有效地提高了搜索引擎的检索效率。网页体裁分类是指按照网页的表现形式及其用途对网页进行分类。介绍了网页体裁的定义,网页体裁分类研究常用的分类特征,并且介绍了几种常用特征筛选方法、分类模型以及分类器的评估方法,为研究者提供了对网页体裁分类的概要性了解。  相似文献   

2.
Query suggestion is generally an integrated part of web search engines. In this study, we first redefine and reduce the query suggestion problem as “comparison of queries”. We then propose a general modular framework for query suggestion algorithm development. We also develop new query suggestion algorithms which are used in our proposed framework, exploiting query, session and user features. As a case study, we use query logs of a real educational search engine that targets K-12 students in Turkey. We also exploit educational features (course, grade) in our query suggestion algorithms. We test our framework and algorithms over a set of queries by an experiment and demonstrate a 66–90% statistically significant increase in relevance of query suggestions compared to a baseline method.  相似文献   

3.
There was a proliferation of electronic information sources and search engines in the 1990s. Many of these information sources became available through the ubiquitous interface of the Web browser. Diverse information sources became accessible to information professionals and casual end users alike. Much of the information was also hyperlinked, so that information could be explored by browsing as well as searching. While vast amounts of information were now just a few keystrokes and mouseclicks away, as the choices multiplied, so did the complexity of choosing where and how to look for the electronic information. Much of the complexity in information exploration at the turn of the twenty-first century arose because there was no common cataloguing and control system across the various electronic information sources. In addition, the many search engines available differed widely in terms of their domain coverage, query methods and efficiency.Meta-search engines were developed to improve search performance by querying multiple search engines at once. In principle, meta-search engines could greatly simplify the search for electronic information by selecting a subset of first-level search engines and digital libraries to submit a query to based on the characteristics of the user, the query/topic, and the search strategy. This selection would be guided by diagnostic knowledge about which of the first-level search engines works best under what circumstances. Programmatic research is required to develop this diagnostic knowledge about first-level search engine performance.This paper introduces an evaluative framework for this type of research and illustrates its use in two experiments. The experimental results obtained are used to characterize some properties of leading search engines (as of 1998). Significant interactions were observed between search engine and two other factors (time of day and Web domain). These findings supplement those of earlier studies, providing preliminary information about the complex relationship between search engine functionality and performance in different contexts. While the specific results obtained represent a time-dependent snapshot of search engine performance in 1998, the evaluative framework proposed should be generally applicable in the future.  相似文献   

4.
The Web and especially major Web search engines are essential tools in the quest to locate online information for many people. This paper reports results from research that examines characteristics and changes in Web searching from nine studies of five Web search engines based in the US and Europe. We compare interactions occurring between users and Web search engines from the perspectives of session length, query length, query complexity, and content viewed among the Web search engines. The results of our research shows (1) users are viewing fewer result pages, (2) searchers on US-based Web search engines use more query operators than searchers on European-based search engines, (3) there are statistically significant differences in the use of Boolean operators and result pages viewed, and (4) one cannot necessary apply results from studies of one particular Web search engine to another Web search engine. The wide spread use of Web search engines, employment of simple queries, and decreased viewing of result pages may have resulted from algorithmic enhancements by Web search engine companies. We discuss the implications of the findings for the development of Web search engines and design of online content.  相似文献   

5.
Large-scale web search engines are composed of multiple data centers that are geographically distant to each other. Typically, a user query is processed in a data center that is geographically close to the origin of the query, over a replica of the entire web index. Compared to a centralized, single-center search engine, this architecture offers lower query response times as the network latencies between the users and data centers are reduced. However, it does not scale well with increasing index sizes and query traffic volumes because queries are evaluated on the entire web index, which has to be replicated and maintained in all data centers. As a remedy to this scalability problem, we propose a document replication framework in which documents are selectively replicated on data centers based on regional user interests. Within this framework, we propose three different document replication strategies, each optimizing a different objective: reducing the potential search quality loss, the average query response time, or the total query workload of the search system. For all three strategies, we consider two alternative types of capacity constraints on index sizes of data centers. Moreover, we investigate the performance impact of query forwarding and result caching. We evaluate our strategies via detailed simulations, using a large query log and a document collection obtained from the Yahoo! web search engine.  相似文献   

6.
李江华  时鹏 《情报杂志》2012,31(4):112-116
Internet已成为全球最丰富的数据源,数据类型繁杂且动态变化,如何从中快速准确地检索出用户所需要的信息是一个亟待解决的问题.传统的搜索引擎基于语法的方式进行搜索,缺乏语义信息,难以准确地表达用户的查询需求和被检索对象的文档语义,致使查准率和查全率较低且搜索范围有限.本文对现有的语义检索方法进行了研究,分析了其中存在的问题,在此基础上提出了一种基于领域的语义搜索引擎模型,结合语义Web技术,使用领域本体元数据模型对用户的查询进行语义化规范,依据领域本体模式抽取文档中的知识并RDF化,准确地表达了用户的查询语义和作为被查询对象的文档语义,可以大大提高检索的准确性和检索效率,详细地给出了模型的体系结构、基本功能和工作原理.  相似文献   

7.
Recent research in the human computer interaction and information retrieval areas has revealed that search response latency exhibits a clear impact on the user behavior in web search. Such impact is reflected both in users’ subjective perception of the usability of a search engine and in their interaction with the search engine in terms of the number of search results they engage with. However, a similar impact analysis has been missing so far in the context of sponsored search. Since the predominant business model for commercial search engines is advertising via sponsored search results (i.e., search advertisements), understanding how response latency influences the user interaction with the advertisements displayed on the search engine result pages is crucial to increase the revenue of a commercial search engine. To this end, we conduct a large-scale analysis using query logs obtained from a commercial web search. We analyze the short-term and long-term impact of search response latency on the querying and clicking behaviors of users using desktop and mobile devices to access the search engine, as well as the corresponding impact on the revenue of the search engine. This analysis demonstrates the importance of serving sponsored search results with low latency and provides insight into the ad serving policy of commercial search engines to ensure long-term user engagement and search revenue.  相似文献   

8.
Queries submitted to search engines can be classified according to the user goals into three distinct categories: navigational, informational, and transactional. Such classification may be useful, for instance, as additional information for advertisement selection algorithms and for search engine ranking functions, among other possible applications. This paper presents a study about the impact of using several features extracted from the document collection and query logs on the task of automatically identifying the users’ goals behind their queries. We propose the use of new features not previously reported in literature and study their impact on the quality of the query classification task. Further, we study the impact of each feature on different web collections, showing that the choice of the best set of features may change according to the target collection.  相似文献   

9.
Both general and domain-specific search engines have adopted query suggestion techniques to help users formulate effective queries. In the specific domain of literature search (e.g., finding academic papers), the initial queries are usually based on a draft paper or abstract, rather than short lists of keywords. In this paper, we investigate phrasal-concept query suggestions for literature search. These suggestions explicitly specify important phrasal concepts related to an initial detailed query. The merits of phrasal-concept query suggestions for this domain are their readability and retrieval effectiveness: (1) phrasal concepts are natural for academic authors because of their frequent use of terminology and subject-specific phrases and (2) academic papers describe their key ideas via these subject-specific phrases, and thus phrasal concepts can be used effectively to find those papers. We propose a novel phrasal-concept query suggestion technique that generates queries by identifying key phrasal-concepts from pseudo-labeled documents and combines them with related phrases. Our proposed technique is evaluated in terms of both user preference and retrieval effectiveness. We conduct user experiments to verify a preference for our approach, in comparison to baseline query suggestion methods, and demonstrate the effectiveness of the technique with retrieval experiments.  相似文献   

10.
To improve search engine effectiveness, we have observed an increased interest in gathering additional feedback about users’ information needs that goes beyond the queries they type in. Adaptive search engines use explicit and implicit feedback indicators to model users or search tasks. In order to create appropriate models, it is essential to understand how users interact with search engines, including the determining factors of their actions. Using eye tracking, we extend this understanding by analyzing the sequences and patterns with which users evaluate query result returned to them when using Google. We find that the query result abstracts are viewed in the order of their ranking in only about one fifth of the cases, and only an average of about three abstracts per result page are viewed at all. We also compare search behavior variability with respect to different classes of users and different classes of search tasks to reveal whether user models or task models may be greater predictors of behavior. We discover that gender and task significantly influence different kinds of search behaviors discussed here. The results are suggestive of improvements to query-based search interface designs with respect to both their use of space and workflow.  相似文献   

11.
Web queries in question format are becoming a common element of a user's interaction with Web search engines. Web search services such as Ask Jeeves – a publicly accessible question and answer (Q&A) search engine – request users to enter question format queries. This paper provides results from a study examining queries in question format submitted to two different Web search engines – Ask Jeeves that explicitly encourages queries in question format and the Excite search service that does not explicitly encourage queries in question format. We identify the characteristics of queries in question format in two different data sets: (1) 30,000 Ask Jeeves queries and 15,575 Excite queries, including the nature, length, and structure of queries in question format. Findings include: (1) 50% of Ask Jeeves queries and less than 1% of Excite were in question format, (2) most users entered only one query in question format with little query reformulation, (3) limited range of formats for queries in question format – mainly “where”, “what”, or “how” questions, (4) most common question query format was “Where can I find………” for general information on a topic, and (5) non-question queries may be in request format. Overall, four types of user Web queries were identified: keyword, Boolean, question, and request. These findings provide an initial mapping of the structure and content of queries in question and request format. Implications for Web search services are discussed.  相似文献   

12.
This paper presents a novel IR-style keyword search model for semantic web data retrieval, distinguished from current retrieval methods. In this model, an answer to a keyword query is a connected subgraph that contains all the query keywords. In addition, the answer is minimal because any proper subgraph can not be an answer to the query. We provide an approximation algorithm to retrieve these answers efficiently. A special ranking strategy is also proposed so that answers can be appropriately ordered. The experimental results over real datasets show that our model outperforms existing possible solutions with respect to effectiveness and efficiency.  相似文献   

13.
This article proposes a process to retrieve the URL of a document for which metadata records exist in a digital library catalog but a pointer to the full text of the document is not available. The process uses results from queries submitted to Web search engines for finding the URL of the corresponding full text or any related material. We present a comprehensive study of this process in different situations by investigating different query strategies applied to three general purpose search engines (Google, Yahoo!, MSN) and two specialized ones (Scholar and CiteSeer), considering five user scenarios. Specifically, we have conducted experiments with metadata records taken from the Brazilian Digital Library of Computing (BDBComp) and The DBLP Computer Science Bibliography (DBLP). We found that Scholar was the most effective search engine for this task in all considered scenarios and that simple strategies for combining and re-ranking results from Scholar and Google significantly improve the retrieval quality. Moreover, we study the influence of the number of query results on the effectiveness of finding missing information as well as the coverage of the proposed scenarios.  相似文献   

14.
Question-answering has become one of the most popular information retrieval applications. Despite that most question-answering systems try to improve the user experience and the technology used in finding relevant results, many difficulties are still faced because of the continuous increase in the amount of web content. Questions Classification (QC) plays an important role in question-answering systems, with one of the major tasks in the enhancement of the classification process being the identification of questions types. A broad range of QC approaches has been proposed with the aim of helping to find a solution for the classification problems; most of these are approaches based on bag-of-words or dictionaries. In this research, we present an analysis of the different type of questions based on their grammatical structure. We identify different patterns and use machine learning algorithms to classify them. A framework is proposed for question classification using a grammar-based approach (GQCC) which exploits the structure of the questions. Our findings indicate that using syntactic categories related to different domain-specific types of Common Nouns, Numeral Numbers and Proper Nouns enable the machine learning algorithms to better differentiate between different question types. The paper presents a wide range of experiments the results show that the GQCC using J48 classifier has outperformed other classification methods with 90.1% accuracy.  相似文献   

15.
专题搜索引擎也称垂直搜索引擎,主要用来满足特定领域的用户需求。Heritrix是开源的网络爬虫,Heritrix的WebUI启动方式并不易用于广大用户。本文改变了往常对Heritrix用法,摒弃了Heritrix的WebUI启动方式,对Heritrix源码进行修改,将Lucene整合到Heritrix中,构建成一个完整的搜索引擎,并通过监听器监听搜索引擎状态,使搜索引擎能够进行自动爬取和数据更新。同时,本文添加了网页过滤模块以及对查询结果排序算法进行了改进,提高了搜索引擎的易用性和查询的准确率。  相似文献   

16.
搜索引擎搜索结果的评价技术   总被引:6,自引:0,他引:6  
陶跃华  孙茂松 《情报科学》2001,19(8):861-863,873
搜索引擎根据用户查询在自己的索引数据库中进行查找,并根据相关性分析将查询结果返回给用户。本文就传统信息检索系统的性能效率评价技术,针对Internet的特点,对搜索引擎搜索结果的评价进行了讨论。  相似文献   

17.
This paper reports results from a study exploring the multimedia search functionality of Chinese language search engines. Web searching in Chinese (Mandarin) is a growing research area and a technical challenge for popular commercial Web search engines. Few studies have been conducted on Chinese language search engines. We investigate two research questions: which Chinese language search engines provide multimedia searching, and what multimedia search functionalities are available in Chinese language Web search engines. Specifically, we examine each Web search engine’s (1) features permitting Chinese language multimedia searches, (2) extent of search personalization and user control of multimedia search variables, and (3) the relationships between Web search engines and their features in the Chinese context. Key findings show that Chinese language Web search engines offer limited multimedia search functionality, and general search engines provide a wider range of features than specialized multimedia search engines. Study results have implications for Chinese Web users, Website designers and Web search engine developers.  相似文献   

18.
Web search engines are beginning to offer access to multimedia searching, including audio, video and image searching. In this paper we report findings from a study examining the state of multimedia search functionality on major general and specialized Web search engines. We investigated 102 Web search engines to examine: (1) how many Web search engines offer multimedia searching, (2) the type of multimedia search functionality and methods offered, such as “query by example”, and (3) the supports for personalization or customization which are accessible as advanced search. Findings include: (1) few major Web search engines offer multimedia searching and (2) multimedia Web search functionality is generally limited. Our findings show that despite the increasing level of interest in multimedia Web search, those few Web search engines offering multimedia Web search, provide limited multimedia search functionality. Keywords are still the only means of multimedia retrieval, while other methods such as “query by example” are offered by less than 1% of Web search engines examined.  相似文献   

19.
搜索引擎自动分类功能评价   总被引:2,自引:0,他引:2  
朱剑俊 《情报科学》2006,24(5):754-757
本文分析了搜索引擎的检索结果自动分类的功能。通过模拟真实环境的检索实验。比较分析了“中国搜索”和“搜狗”在该项功能上的特点、区别和用户使用情况,并对此做出评价。  相似文献   

20.
曲琳琳 《情报科学》2021,39(8):132-138
【目的/意义】跨语言信息检索研究的目的即在消除因语言的差异而导致信息查询的困难,提高从大量纷繁 复杂的查找特定信息的效率。同时提供一种更加方便的途径使得用户能够使用自己熟悉的语言检索另外一种语 言文档。【方法/过程】本文通过对国内外跨语言信息检索的研究现状分析,介绍了目前几种查询翻译的方法,包括: 直接查询翻译、文献翻译、中间语言翻译以及查询—文献翻译方法,对其效果进行比较,然后阐述了跨语言检索关 键技术,对使用基于双语词典、语料库、机器翻译技术等产生的歧义性提出了解决方法及评价。【结果/结论】使用自 然语言处理技术、共现技术、相关反馈技术、扩展技术、双向翻译技术以及基于本体信息检索技术确保知识词典的 覆盖度和歧义性处理,通过对跨语言检索实验分析证明采用知识词典、语料库和搜索引擎组合能够提高查询效 率。【创新/局限】本文为了解决跨语言信息检索使用词典、语料库中词语缺乏的现象,提出通过搜索引擎从网页获 取信息资源来充实语料库中语句对不足的问题。文章主要针对中英文信息检索问题进行了探讨,解决方法还需要 进一步研究,如中文切词困难以及字典覆盖率低等严重影响检索的效率。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号