首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Changes in the vocabulary of information science over a period of eleven years were studied in order to determine the effect of such change on index vocabularies. The language of the Annual Review of Information Science and Technology was studied; the vocabulary was found to be changing at a rate of about 4% per year, with old terms leaving the vocabulary at about the same rate as new ones enter it. No change of usage among synonyms was found, but trends in the discipline showed up in changes in emphasis of the vocabulary. In particular, hardware-oriented terms seem to be declining in importance; there is some evidence that management and cataloging are becoming more important. Conclusions are: thesauri and other indexing vocabularies must provide for change as expected, with deletion or other provision for terms which pass out of use as important as addition of new terms; change of usage among synonyms is not significant in the short term (eleven years); and vocabulary change indicates the direction of growth in a discipline.  相似文献   

2.
刘华梅  侯汉清 《情报科学》2007,25(1):93-96,112
美国伯克莱大学近年来开发设计的语义工具———入口词表模块(EVM),目标是实现自然语言到受控语言的转换。本文详细介绍了EVM的创建过程和实现机制,首先利用数据检索代理和清洗代理从远程数据库中下载记录;然后抽取代理和构建代理利用下载的数据,采用统计学的方法计算自然语言词汇和受控语言词汇之间的关系,构建关联词典;最后由桌面代理和领域代理提供给用户感兴趣的领域,帮助用户进行检索。另外还列举了目前已经建立的几种模块,如INSPEC、MEDLINE、PATENTS等。最后还提出了这一项目未来需要改进的工作。  相似文献   

3.
4.
张桃 《科教文汇》2013,(25):132-132,136
运用语言进行书面或口头表达时,名词项完全具备的形式或单个词的形式,称之为句子。句子是由词汇按照一定的语法、语义搭配在特定的情境中所表现出的一种序列组合,在理解句子时,须逐一对这些要素进行分析,各个击破,综合考虑,才能达到最佳效果。本文先讨论句子的构筑,在此基础上分别从词汇、语序、语法、语义和语境几方面分析句子,使二外日语学习者能更好地理解日语句子。  相似文献   

5.
《汉语主题词表》是我国第一部大型综合性的广泛使用的叙词检索语言词表,如何实现该词表与其他词表的互操作,促进其在语义网环境下的应用发展具有重要意义。本文利用W3C组织颁布的简单知识组织系统(SKOS)的研究成果,讨论《汉语主题词表》的SKOS表示方法。为提高表示效率,运用Java技术,在定义《汉语主题词表》数据库结构和各个数据表之间逻辑关系的基础上,明确了从数据库向SKOS语言表示的自动转换思路,并设计代码完成转换。  相似文献   

6.
Pseudo-relevance feedback (PRF) is a classical technique to improve search engine retrieval effectiveness, by closing the vocabulary gap between users’ query formulations and the relevant documents. While PRF is typically applied on the same target corpus as the final retrieval, in the past, external expansion techniques have sometimes been applied to obtain a high-quality pseudo-relevant feedback set using the external corpus. However, such external expansion approaches have only been studied for sparse (BoW) retrieval methods, and its effectiveness for recent dense retrieval methods remains under-investigated. Indeed, dense retrieval approaches such as ANCE and ColBERT, which conduct similarity search based on encoded contextualised query and document embeddings, are of increasing importance. Moreover, pseudo-relevance feedback mechanisms have been proposed to further enhance dense retrieval effectiveness. In particular, in this work, we examine the application of dense external expansion to improve zero-shot retrieval effectiveness, i.e. evaluation on corpora without further training. Zero-shot retrieval experiments with six datasets, including two TREC datasets and four BEIR datasets, when applying the MSMARCO passage collection as external corpus, indicate that obtaining external feedback documents using ColBERT can significantly improve NDCG@10 for the sparse retrieval (by upto 28%) and the dense retrieval (by upto 12%). In addition, using ANCE on the external corpus brings upto 30% NDCG@10 improvements for the sparse retrieval and upto 29% for the dense retrieval.  相似文献   

7.
Language modeling (LM), providing a principled mechanism to associate quantitative scores to sequences of words or tokens, has long been an interesting yet challenging problem in the field of speech and language processing. The n-gram model is still the predominant method, while a number of disparate LM methods, exploring either lexical co-occurrence or topic cues, have been developed to complement the n-gram model with some success. In this paper, we explore a novel language modeling framework built on top of the notion of relevance for speech recognition, where the relationship between a search history and the word being predicted is discovered through different granularities of semantic context for relevance modeling. Empirical experiments on a large vocabulary continuous speech recognition (LVCSR) task seem to demonstrate that the various language models deduced from our framework are very comparable to existing language models both in terms of perplexity and recognition error rate reductions.  相似文献   

8.
[目的/意义]旨在提出一种基于领域词典的突发公共安全领域舆情事件自动识别方法,有效识别公共安全领域的热点舆情事件,预防危机舆情事件,提高政府公信力。[方法/过程]首先以中国应急服务网中的公共安全事件语料为数据来源,提取并筛选公共安全领域的高频词汇;然后结合人工干预方式选择部分高频且与领域高度相关的种子词;随后以互信息方法计算种子词与语料中的其他词汇共现概率(点互信息),同时以与种子词具有较高点互信息的词汇作为领域候选词,并结合人工审核方式对候选词汇进行调整。最后在对待识别语料进行文本表示的基础上,将其与词典中的领域词汇进行匹配,并以语料中出现的公共安全领域词汇的数量和权重来判断待识别语料是否为突发公共安全舆情事件。[结果/结论]在标注语料上的实验结果表明,与经典的Naive Bayes方法相比,提出的方法能够有效提高公共安全领域热点舆情事件的识别准确率。  相似文献   

9.
An expert system was developed in the area of information retrieval, with the objective of performing the job of an information specialist, who assists users in selecting the right vocabulary terms for a database search.The system is composed of two components: One is the knowledge base, represented as a semantic network, in which the nodes are words, concepts, phrases, comprising a vocabulary of the application area and the links express semantic relationships between those nodes. The second component is the rules, or procedures, which operate upon the knowledge-base, analogous to the decision rules or work patterns of the information specialist.Two major stages comprise the consulting process of the system: During the “search” stage relevant knowledge in the semantic network is activated, and search and evaluation rules are applied in order to find appropriate vocabulary terms to represent the user's problem. During the “suggest” stage those terms are further evaluated, dynamically rank-ordered according to relevancy, and suggested to the user. Explanations to the findings can be provided by the system and backtracking is possible in order to find alternatives in case some suggested term is rejected by the user.This article presents the principle, procedures and rules which are utilized in the expert system.  相似文献   

10.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

11.
Current Web-based search engines presume a category search for a specific group of users. This approach is appropriate for generalized information searches since it is based on statistically generated user profiles. However, in some applications, such as medicine and law, an individualized search for a specific user at a given point in time is desired. In addition, the use of specialized terminology in some fields necessitates guidance for the non-expert to be successful in locating the desired information. This paper presents a new decision support system enabled by the analytic hierarchy process and intelligent software agents that can be used by researchers and practitioners in technical fields to aid information retrieval and improve search results from a controlled vocabulary. An application from telemedicine is given to illustrate the potential improvements.  相似文献   

12.
Using genetic algorithms to evolve a population of topical queries   总被引:1,自引:1,他引:0  
Systems for searching the Web based on thematic contexts can be built on top of a conventional search engine and benefit from the huge amount of content as well as from the functionality available through the search engine interface. The quality of the material collected by such systems is highly dependant on the vocabulary used to generate the search queries. In this scenario, selecting good query terms can be seen as an optimization problem where the objective function to be optimized is based on the effectiveness of a query to retrieve relevant material. Some characteristics of this optimization problem are: (1) the high-dimensionality of the search space, where candidate solutions are queries and each term corresponds to a different dimension, (2) the existence of acceptable suboptimal solutions, (3) the possibility of finding multiple solutions, and in many cases (4) the quest for novelty. This article describes optimization techniques based on Genetic Algorithms to evolve “good query terms” in the context of a given topic. The proposed techniques place emphasis on searching for novel material that is related to the search context. We discuss the use of a mutation pool to allow the generation of queries with new terms, study the effect of different mutation rates on the exploration of query-space, and discuss the use of a especially developed fitness function that favors the construction of queries containing novel but related terms.  相似文献   

13.
A theory of indexing helps explain the nature of indexing, the structure of the vocabulary, and the quality of the index. Indexing theories formulated by Jonker, Heilprin, Landry and Salton are described. Each formulation has a different focus. Jonker, by means of the Terminological and Connective Continua, provided a basis for understanding the relationships between the size of the vocabulary, the hierarchical organization, and the specificity by which concepts can be described. Heilprin introduced the idea of a search path which leads from query to document. He also added a third dimension to Jonker's model; the three variables are diffuseness, permutivity and hierarchical connectedness. Landry made an ambitious and well conceived attempt to build a comprehensive theory of indexing predicated upon sets of documents, sets of attributes, and sets of relationships between the two. It is expressed in theorems and by formal notation. Salton provided both a notational definition of indexing and procedures for improving the ability of index terms to discriminate between relevant and nonrelevant documents. These separate theories need to be tested experimentally and eventually combined into a unified comprehensive theory of indexing.  相似文献   

14.
龙怡  云太真 《情报科学》2021,39(9):117-124
【目的/意义】我国“互联网+政务服务”发展迅速,在线政务服务资源日益丰富,民众能否通过搜索引擎查 找到政务服务是影响在线政务服务平台成效的重要因素。政务服务资源搜索的主要目标是“查准”,研究提出关于 中美政务服务资源搜索引擎可见性的八个假设。【方法/过程】按照查找典型政务服务个人事项“申领机动车驾驶 证”和法人事项“注册有限责任公司”的需求构造中英文关键词,分别通过百度和谷歌,以定位到中国各省和美国各 州经济最发达城市为目标进行搜索实验,采集首页搜索结果并进行相关性评分。在此基础上进行搜索引擎搜索功 能的统计分析,用非参数检验验证假设。【结果/结论】研究认为搜索引擎理解政务服务词汇的能力直接影响了其搜 索水平,政务服务平台也可以通过搜索引擎优化提升可见性。【创新/局限】研究创新在于构建中英文关键词,直接 采集百度和谷歌的搜索结果进行跨国比较研究,突破了以往同类研究的宽度和深度;局限主要在于相关性判断存 在主观性和搜索对象的规模较小。  相似文献   

15.
This paper tackles the problem of how one might select further search terms, using relevance feedback, given the search terms in the query. These search terms are extracted from a maximum spanning tree connecting all the terms in the index term vocabulary. A number of different spanning trees are generated from a variety of association measures. The retrieval effectiveness for the different spanning trees is shown to be approximately the same. Effectiveness is measured in terms of precision and recall, and the retrieval tests are done on three different test collections.  相似文献   

16.
A critical challenge for Web search engines concerns how they present relevant results to searchers. The traditional approach is to produce a ranked list of results with title and summary (snippet) information, and these snippets are usually chosen based on the current query. Snippets play a vital sensemaking role, helping searchers to efficiently make sense of a collection of search results, as well as determine the likely relevance of individual results. Recently researchers have begun to explore how snippets might also be adapted based on searcher preferences as a way to better highlight relevant results to the searcher. In this paper we focus on the role of snippets in collaborative web search and describe a technique for summarizing search results that harnesses the collaborative search behaviour of communities of like-minded searchers to produce snippets that are more focused on the preferences of the searchers. We go on to show how this so-called social summarization technique can generate summaries that are significantly better adapted to searcher preferences and describe a novel personalized search interface that combines result recommendation with social summarization.  相似文献   

17.
The acquisition of information and the search interaction process is influenced strongly by a person’s use of their knowledge of the domain and the task. In this paper we show that a user’s level of domain knowledge can be inferred from their interactive search behaviors without considering the content of queries or documents. A technique is presented to model a user’s information acquisition process during search using only measurements of eye movement patterns. In a user study (n = 40) of search in the domain of genomics, a representation of the participant’s domain knowledge was constructed using self-ratings of knowledge of genomics-related terms (n = 409). Cognitive effort features associated with reading eye movement patterns were calculated for each reading instance during the search tasks. The results show correlations between the cognitive effort due to reading and an individual’s level of domain knowledge. We construct exploratory regression models that suggest it is possible to build models that can make predictions of the user’s level of knowledge based on real-time measurements of eye movement patterns during a task session.  相似文献   

18.
Professional, workplace searching is different from general searching, because it is typically limited to specific facets and targeted to a single answer. We have developed the semantic component (SC) model, which is a search feature that allows searchers to structure and specify the search to context-specific aspects of the main topic of the documents. We have tested the model in an interactive searching study with family doctors with the purpose to explore doctors’ querying behaviour, how they applied the means for specifying a search, and how these features contributed to the search outcome. In general, the doctors were capable of exploiting system features and search tactics during the searching. Most searchers produced well-structured queries that contained appropriate search facets. When searches failed it was not due to query structure or query length. Failures were mostly caused by the well-known vocabulary problem. The problem was exacerbated by using certain filters as Boolean filters. The best working queries were structured into 2–3 main facets out of 3–5 possible search facets, and expressed with terms reflecting the focal view of the search task. The findings at the same time support and extend previous results about query structure and exhaustivity showing the importance of selecting central search facets and express them from the perspective of search task. The SC model was applied in the highest performing queries except one. The findings suggest that the model might be a helpful feature to structure queries into central, appropriate facets, and in returning highly relevant documents.  相似文献   

19.
陆凤珍 《科教文汇》2014,(23):117-118
动物与人类的生活密切相关;动物词汇更是存在于人类的日常会话中。因此,对动物词汇的研究历来是认知语言学的重点。语言是文化的载体,英国语言学家Palmer说过:“语言忠实反映了一个民族的全部历史、文化”。从这个意义上来说,任何人对动物的情感都载有特定的文化内涵。本论文是关于动物词汇在汉英语言文化中的比较研究,通过对比,我们可以发现由于文化差异,动物词汇在汉英语言中有不同的内涵。  相似文献   

20.
The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号