首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 0 毫秒
1.
Authors and searchers usually express the same things in many different ways, which causes problems in free text searching of text databases. Thus, a switching tool connecting the different names of one concept is needed. This study tests the effectiveness of a thesaurus as a search-aid in free text searching of a full text database. A set of queries was searched against a large full text database of newspaper articles. The search-aid thesaurus constructed for the test contains the usual relationships of a thesaurus, namely equivalence, hierarchical, and associative relationships. Each query was searched in five distinct modes: basic search, synonym search, narrower term search, related term search, and union of all previous searches. The basic searches contained only terms included in the original query statements. In the synonym searches, the terms of the basic search were extended by disjunction of the synonyms given by the search-aid thesaurus without modifying the overall logic of the basic search. Likewise, the basic search was extended in turn with the narrower terms and with the related terms given by the search-aid thesaurus. The last search mode included the basic terms and all the terms used in the previous searches. The searches were analyzed in terms of relative recall and precision; relative recall was estimated by setting the recall of the union search to 100%. On the average the value of relative recall was 47.2% in the basic search, compared with 100% in the union search; the average value of precision decreased only from 62.5% in the basic search to 51.2% in the union search.  相似文献   

2.
We report on the design and construction of features of an automated query system which will assist pharmacologists who are not information specialists to access the Derwent Drug File (DDF) pharmacological database. Our approach was to first elucidate those search skills of the search intermediary which might prove tractable to automation. Modules were then produced which assist in the three important subtasks of search statement generation, namely vocabulary selection, the choice of context indicators and query reformulation. Vocabulary selection is facilitated by approximate string matching, morphological analysis, browsing and menu searching. The context of the study, such as treatment or metabolism, is determined using a system of advisory menus. The task of query reformulation is performed using user feedback on retrieved documents, thesaurus relations between document index terms and term postings data. Use is made of diverse information sources, including electronic forms of printed search aids, a thesaurus and a medical dictionary. The system will be of use both to semicasual users and experienced intermediaries. Many of the ideas developed should prove transportable to domains other than pharmacology: the techniques for thesaurus manipulation are designed for use with any hierarchical thesaurus.  相似文献   

3.
4.
Direct end-user data entry and retrieval is a major factor in achieving an economical information retrieval system. To be effective, such a system would have to provide a thesaurus structure which leads novice end-users to browse subject areas before retrieval and yet provides control and coverage of terms in a domain. A faceted hierarchical thesaurus organization has been designed to accomplish this goal.  相似文献   

5.
6.
本文简要地介绍了MultiTes 2007 Pro的使用方法,并通过创建一个小型的情报学叙词表,讨论了该软件的功能和特点,情报学主题词的获取以及创建一个简单叙词表的步骤,最后,本文对MultiTes 2007 Pro的优缺点进行了简要评价。  相似文献   

7.
The rate of citation duplication was examined in three databases: MEDLINE, BIOSIS, and LIFE SCIENCES COLLECTION. Duplicate citations were found to be more pertinent than unique citations. The duplicate citations came from a highly compact literature, while those from a single database were very widely scattered. The pertinent duplicated citations were more likely to be retrieved in searches that had more terms overall, had a higher percentage of thesaurus terms, and had terms which appeared in both title and abstract. These results suggest that the rate of duplication of citations in multidatabase searches may be used to rank output according to probable pertinence.  相似文献   

8.
提出了一套适用于网络环境中信息资源组织用领域叙词表自动编制方案,系统地阐述了自动编制过程中的步骤,并介绍了其中的关键技术,包括词表收词选词原则与方法,等同关系、等级关系和相关关系的自动识别方法和技术。最后指出,只有不断地维护和更新才能保证词表具有永久的生命力。  相似文献   

9.
The relevance feedback process uses information obtained from a user about a set of initially retrieved documents to improve subsequent search formulations and retrieval performance. In extended Boolean models, the relevance feedback implies not only that new query terms must be identified and re-weighted, but also that the terms must be connected with Boolean And/Or operators properly. Salton et al. proposed a relevance feedback method, called DNF (disjunctive normal form) method, for a well established extended Boolean model. However, this method mainly focuses on generating Boolean queries but does not concern about re-weighting query terms. Also, this method has some problems in generating reformulated Boolean queries. In this study, we investigate the problems of the DNF method and propose a relevance feedback method using hierarchical clustering techniques to solve those problems. We also propose a neural network model in which the term weights used in extended Boolean queries can be adjusted by the users’ relevance feedbacks.  相似文献   

10.
Term classifications and thesauri can be used for many purposes in automatic information retrieval. Normally a thesaurus is generated manually by subject experts: alternatively, the associations between the terms can be obtained automatically by using the occurrence characteristics of the terms across the documents of a collection. A third possibility consists in taking into account user relevance assessments of certain documents with respect to certain queries in order to build term classes designed to retrieve the relevant documents and simultaneously to reject the nonrelevant documents. This last strategy, known as pseudoclassification, produces a user-dependent term classification.A number of pseudoclassification studies are summarized in the present report, and conclusions are reached concerning the effectiveness and feasibility of constructing term classifications based on human relevance assessments.  相似文献   

11.
An information retrieval performance measure that is interpreted as the percent of perfect performance (PPP) can be used to study the effects of the inclusion of specific document features or feature classes or techniques in an information retrieval system. Using this, one can measure the relative quality of a new ranking algorithm, the result of incorporating specific types of metadata or folksonomies from natural language, or determine what happens when one makes modifications to terms, such as stemming or adding part-of-speech tags. For example, knowledge that removing stopwords in a specific system improves the performance 5% of the way from the level of random performance to the best possible result is relatively easy to interpret and to use in decision making; using this percent based measure also allows us to simply compute and interpret that there remains 95% of the possible performance to be obtained using other methods. The PPP measure as used here is based on the average search length, a measure of the ordering quality of a set of data, and may be used when evaluating all the documents or just the first N documents in an ordered list of documents. Because the ASL may be computed empirically or may be estimated analytically, the PPP measure may also be computed empirically or performance may be estimated analytically. Different levels of upper bound performance are discussed.  相似文献   

12.
汪建  张驰 《科学学研究》2020,38(11):2008-2019
在大规模定制化环境中,产品多样化水平决策是一个重要课题。不断增强的研发能力支持企业创新产品;随着多样化水平提升,单品种的研发投入也会减少,从而可能影响研发能力提升。两者相互作用的机制受到研发能力种类、研发环节等因素影响。为了分析两者的影响关系,我们以提升企业经济效益为目标,分别建立了两类研发能力与多样化水平决策的回归分析模型。基于2017年上海市高新技术企业的调查数据开展了实证分析。结果发现产品多样化水平决策对研发产出与经济绩效具有调节作用,且受到研发类型、行业特点和经济效益目标长短等因素的影响。在短期上,对于产品多样化较低的行业,产品多样化对于探索式创新产出促进经济效益的路径表现为正向调节;对于产品多样化较高的行业,产品多样化对于利用式创新产出促进经济效益的路径表现为负向调节作用。在长期上,两大类行业的调节作用呈现出相反的表现。长期与短期不同影响的差异产生原因在于企业依据经济效益开展研发投入的机制设计。基于研究结果,针对不同类型的研发能力、不同行业特点、以及长期和短期经济目标差异对多样化水平决策提出建议。  相似文献   

13.
吕美香 《情报科学》2012,(8):1160-1166
词表是图书馆和信息检索领域最重要的知识组织工具,《中国分类主题词表》是传统词表的一种,它的更新和维护一直依靠手工进行,这制约了它在数字图书馆和网络信息环境下的应用。本文介绍了一项基于统计的、从元数据的标题中抽取关键词并定位在词表中的方法。大致包括三个步骤:从标题中提取关键词;确定抽取出的关键词的专指度;将专指度高的专业词汇定位在词表中。在《中国分类主题词表》和上海图书馆提供的计算机科技领域的元数据上所进行实验,结果证明该方法是可行的。这一方法可以应用到自动标引或编目中,有一定的实用性和广阔的应用前景。  相似文献   

14.
张嶷  汪雪锋  朱东华  周潇 《科学学研究》2013,31(11):1615-1622
 如何从科技文献数据中获取有效的信息,提升知识发现的能力是当前科学学研究中甚为关注的热点问题。大量相关的分析技术与方法均围绕自然语言处理技术所获取的“主题词”展开。然而,一般情况下,从科技文献数据中获取的主题词数量庞大,人工清洗几无可能,软件清洗亦缺乏可信度。本文以文献计量学方法为基础,构建了包括停词表、模糊语义处理、关联规则、词频与文档频次转换以及聚类分析在内的半自动化“主题词簇”方法体系,实现了以定量方法为主、定性方法为辅的主题词清洗、合并与聚类方案,旨在为技术竞争情报分析提供更为精准的主题词词表。本文以Derwent专利数据库中国“光伏电池”领域的科技文献为例,展开实证研究,验证了方法的科学性与有效性。  相似文献   

15.
The present-day guidlines for thesaurus design recommend the two different strategies—the committee and empirical approaches—for identifying candidate terms. An argument is made that the basis for the recommendation is the assumption that the knowledge based on the consensus of experts of a field is different from the knowledge expressed in the literature of that field. An experiment was conducted to test the validity of this assumption. The finding that the two strategies failed to generate the two significantly different lists of terms challenges the validity of the assumption and raises several important questions to the theorists who write the guidelines for thesaurus design and to those who must put the guidelines into practice for design of a thesaurus.  相似文献   

16.
雷晓  常春  刘伟 《情报科学》2021,39(1):135-141
【目的/意义】为保证叙词表术语收录的完整性,需要及时将领域出现但未收录的新术语补充收录到叙词表 中,结合候选词的时间及文档词频特征,从时间序列角度探索新术语的分布情况以指导新术语遴选是值得研究的 问题。【方法/过程】文章主要对词汇文档词频对应的时间序列进行研究,将时间序列进行词频归一化及时间等长预 处理,引入k-means聚类算法,对候选词汇进行基于时间序列趋势变化的聚类,探索术语以及非术语趋势变化的规 律,进而总结新术语应该满足的趋势变化特征。【结果/结论】通过聚类研究,总结得出新术语普遍处于增长趋势。 实证将处于增长状态的候选词汇遴选出来,经过专家判断,该方法可以有效从候选词汇中遴选出其中能补充到叙 词表中的新术语,该方法有比较高的准确率。【创新/局限】创新之处表现为叙词表新术语的遴选中同时考虑了时间 变化和文档词频因素,局限于数据处理规模,实证中只统计了论文关键词的词频数据。  相似文献   

17.
18.
In this paper we demonstrate a new method for concentrating the set of key-words of a thesaurus. This method is based on a mathematical study that we have carried out into the distribution of characters in a defined natural language.We have built a function f of concentration which generates only a few synonyms. In applying this function to the set of key-words of a thesaurus, we reduce each key-word to four characters without synonymity. (For three characters we have a rate of synonymity of approx. 1/1000th.)A new structure of binary files allows the thesaurus to be contained in a table of less than 700 bytes.  相似文献   

19.
Information-systems are classified into two types, termed “Evidence-of Existence” and “Presentation” of information. The objective of the evidence-type system lies in the domain of documentation and retrieval of information. The structure of this system-type is developed, with application of cybernetic concepts, as an isomorphic model in analogy to the system-structure of communication technology. The latter postulates three criteria of structuring: (1) Source-Channel-Sink, with input-output characteristics, (2) Filter-type communication-channel, (3) Reversable code. These criteria are applied to the structuring of information-systems of the evidence-of-existence type. For the purpose of two-way communication the information-systems have to be represented by closed-loop models. The selective-retrieval requirements necessitate the system-channel to be a filter of information. These information-filters are implemented by keyword-phrases, being identical with the codewords. They yield a uniquely decodable code which is totally reversible to adequately serve both the documentation and the retrieval of documents. It is proven that hierarchic information-systems, applying categorization or subject-heading objects of information, do not meet the mandatory code-requirements. The inherent coding-deficiencies of hierarchic systems generate intolerable retrieval ambiguities. The same critique applies to the thesaurus concept. The development of a novel species of thesaurus is suggested, realizing a kind of Linnéan encyclopedia of general human knowledge, presenting all relevant interrelations of objects of knowledge. Such thesaurus would provide the much needed support for formulating efficient search queries. Other relevant features of communication technology, like the information-potential, should be isomorphically transformed into information-system models.  相似文献   

20.
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号