首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 15 毫秒
1.
Determining requirements when searching for and retrieving relevant information suited to a user’s needs has become increasingly important and difficult, partly due to the explosive growth of electronic documents. The vector space model (VSM) is a popular method in retrieval procedures. However, the weakness in traditional VSM is that the indexing vocabulary changes whenever changes occur in the document set, or the indexing vocabulary selection algorithms, or parameters of the algorithms, or if wording evolution occurs. The major objective of this research is to design a method to solve the afore-mentioned problems for patent retrieval. The proposed method utilizes the special characteristics of the patent documents, the International Patent Classification (IPC) codes, to generate the indexing vocabulary for presenting all the patent documents. The advantage of the generated indexing vocabulary is that it remains unchanged, even if the document sets, selection algorithms, and parameters are changed, or if wording evolution occurs. Comparison of the proposed method with two traditional methods (entropy and chi-square) in manual and automatic evaluations is presented to verify the feasibility and validity. The results also indicate that the IPC-based indexing vocabulary selection method achieves a higher accuracy and is more satisfactory.  相似文献   

2.
一种智能型的信息检索方法:隐含语义索引法   总被引:3,自引:0,他引:3  
陶蕾 《情报理论与实践》2004,27(3):308-309,301
介绍了一种新的自动索引和检索方法——隐含语义索引法。隐含语义索引法是一种全自动的智能索引方法,通过挖掘文本与词汇之间的隐含关系来达到提高检索效率的目的。  相似文献   

3.
Vocabulary mining in information retrieval refers to the utilization of the domain vocabulary towards improving the user’s query. Most often queries posed to information retrieval systems are not optimal for retrieval purposes. Vocabulary mining allows one to generalize, specialize or perform other kinds of vocabulary-based transformations on the query in order to improve retrieval performance. This paper investigates a new framework for vocabulary mining that derives from the combination of rough sets and fuzzy sets. The framework allows one to use rough set-based approximations even when the documents and queries are described using weighted, i.e., fuzzy representations. The paper also explores the application of generalized rough sets and the variable precision models. The problem of coordination between multiple vocabulary views is also examined. Finally, a preliminary analysis of issues that arise when applying the proposed vocabulary mining framework to the Unified Medical Language System (a state-of-the-art vocabulary system) is presented. The proposed framework supports the systematic study and application of different vocabulary views in information retrieval.  相似文献   

4.
5.
Traditional information retrieval techniques that primarily rely on keyword-based linking of the query and document spaces face challenges such as the vocabulary mismatch problem where relevant documents to a given query might not be retrieved simply due to the use of different terminology for describing the same concepts. As such, semantic search techniques aim to address such limitations of keyword-based retrieval models by incorporating semantic information from standard knowledge bases such as Freebase and DBpedia. The literature has already shown that while the sole consideration of semantic information might not lead to improved retrieval performance over keyword-based search, their consideration enables the retrieval of a set of relevant documents that cannot be retrieved by keyword-based methods. As such, building indices that store and provide access to semantic information during the retrieval process is important. While the process for building and querying keyword-based indices is quite well understood, the incorporation of semantic information within search indices is still an open challenge. Existing work have proposed to build one unified index encompassing both textual and semantic information or to build separate yet integrated indices for each information type but they face limitations such as increased query process time. In this paper, we propose to use neural embeddings-based representations of term, semantic entity, semantic type and documents within the same embedding space to facilitate the development of a unified search index that would consist of these four information types. We perform experiments on standard and widely used document collections including Clueweb09-B and Robust04 to evaluate our proposed indexing strategy from both effectiveness and efficiency perspectives. Based on our experiments, we find that when neural embeddings are used to build inverted indices; hence relaxing the requirement to explicitly observe the posting list key in the indexed document: (a) retrieval efficiency will increase compared to a standard inverted index, hence reduces the index size and query processing time, and (b) while retrieval efficiency, which is the main objective of an efficient indexing mechanism improves using our proposed method, retrieval effectiveness also retains competitive performance compared to the baseline in terms of retrieving a reasonable number of relevant documents from the indexed corpus.  相似文献   

6.
A procedure for automated indexing of pathology diagnostic reports at the National Institutes of Health is described. Diagnostic statements in medical English are encoded by computer into the Systematized Nomenclature of Pathology (SNOP). SNOP is a structured indexing language constructed by pathologists for manual indexing. It is of interest that effective automatic encoding can be based upon an existing vocabulary and code designed for manual methods. Morphosyntactic analysis, a simple syntax analysis, matching of dictionary entries consisting of several words, and synonym substitutions are techniques utilized.  相似文献   

7.
8.
Traditional index weighting approaches for information retrieval from texts depend on the term frequency based analysis of the text contents. A shortcoming of these indexing schemes, which consider only the occurrences of the terms in a document, is that they have some limitations in extracting semantically exact indexes that represent the semantic content of a document. To address this issue, we developed a new indexing formalism that considers not only the terms in a document, but also the concepts. In this approach, concept clusters are defined and a concept vector space model is proposed to represent the semantic importance degrees of lexical items and concepts within a document. Through an experiment on the TREC collection of Wall Street Journal documents, we show that the proposed method outperforms an indexing method based on term frequency (TF), especially in regard to the few highest-ranked documents. Moreover, the index term dimension was 80% lower for the proposed method than for the TF-based method, which is expected to significantly reduce the document search time in a real environment.  相似文献   

9.
靖培栋  宋雯斐 《情报科学》2006,24(6):884-887
本文探讨了在基于关键词索引的中文全文检索系统中实现各种截词检索的方法,建立了关键词索引的Hash索引,这种方法即能节省内存又提高检索效率。  相似文献   

10.
The vector space model (VSM) is a textual representation method that is widely used in documents classification. However, it remains to be a space-challenging problem. One attempt to alleviate the space problem is by using dimensionality reduction techniques, however, such techniques have deficiencies such as losing some important information. In this paper, we propose a novel text classification method that neither uses VSM nor dimensionality reduction techniques. The proposed method is a space efficient method that utilizes the first order Markov model for hierarchical Arabic text classification. For each category and sub-category, a Markov chain model is prepared based on the neighboring characters sequences. The prepared models are then used for scoring documents for classification purposes. For evaluation, we used a hierarchical Arabic text data collection that contains 11,191 documents that belong to eight topics distributed into 3-levels. The experimental results show that the Markov chains based method significantly outperforms the baseline system that employs the latent semantic indexing (LSI) method. That is, the proposed method enhances the F1-measure by 3.47%. The novelty of this work lies on the idea of decomposing words into sequences of characters, which found to be a promising approach in terms of space and accuracy. Based on our best knowledge, this is the first attempt to conduct research for hierarchical Arabic text classification with such relatively large data collection.  相似文献   

11.
A variety of abstract automatic indexing models have been developed in recent times in an effort to produce indexing methods that are both effective and usable in practice. Among these are the term discrimination model and the term precision system. These two indexing systems are briefly described and experimental evidence is cited showing that a combination of both theories produces better retrieval performance than either one alone. Appropriate conclusions are reached concerning viable automatic indexing procedures usable in practice.  相似文献   

12.
The inverted file is the most popular indexing mechanism for document search in an information retrieval system. Compressing an inverted file can greatly improve document search rate. Traditionally, the d-gap technique is used in the inverted file compression by replacing document identifiers with usually much smaller gap values. However, fluctuating gap values cannot be efficiently compressed by some well-known prefix-free codes. To smoothen and reduce the gap values, we propose a document-identifier reassignment algorithm. This reassignment is based on a similarity factor between documents. We generate a reassignment order for all documents according to the similarity to reassign closer identifiers to the documents having closer relationships. Simulation results show that the average gap values of sample inverted files can be reduced by 30%, and the compression rate of d-gapped inverted file with prefix-free codes can be improved by 15%.  相似文献   

13.
由于信息的多样性和复杂性,使得对不确定信息的表示和建模都要通过相应的方法完成。由于方法的不统一,难以实现对异类不确定信息的融合。寻找一种统一的理论实现多源异类信息的表示与建模,已成为信息融合领域中的关键问题。已有研究成果表明,随机集理论有望解决该难题。介绍了一种新的不确定推理理论-DSm理论,引入了随机集理论的基本概念和性质,以随机集为工具,给出了经典DSm组合规则的随机集表示,从而将DSmT纳入基于随机集理论的不确定信息统一表示与建模架构中。  相似文献   

14.
The relevance feedback process uses information obtained from a user about a set of initially retrieved documents to improve subsequent search formulations and retrieval performance. In extended Boolean models, the relevance feedback implies not only that new query terms must be identified and re-weighted, but also that the terms must be connected with Boolean And/Or operators properly. Salton et al. proposed a relevance feedback method, called DNF (disjunctive normal form) method, for a well established extended Boolean model. However, this method mainly focuses on generating Boolean queries but does not concern about re-weighting query terms. Also, this method has some problems in generating reformulated Boolean queries. In this study, we investigate the problems of the DNF method and propose a relevance feedback method using hierarchical clustering techniques to solve those problems. We also propose a neural network model in which the term weights used in extended Boolean queries can be adjusted by the users’ relevance feedbacks.  相似文献   

15.
齐寻昕 《科教文汇》2014,(14):131-132
二语词义的习得很大程度上依靠和借助母语的词义系统。本文用心理词典理论、词汇语义学的相关理论从义项的角度,探讨义项对应空缺造成的母语干扰以及汉语对掌握缅语词汇聚合和组合关系的负面影响。探讨的过程是从汉族学习者所出现的错误出发,采用对比分析的方式进行。  相似文献   

16.
This paper describes a technique for automatic book indexing. The technique requires a dictionary of terms that are to appear in the index, along with all text strings that count as instances of the term. It also requires that the text be in a form suitable for processing by a text formatter. A program searches the text for each occurrence of a term or its associated strings and creates an entry to the index when either is found. The results of the experimental application to a portion of a book text are presented, including measures of precision and recall, with precision giving the ratio of terms correctly assigned in the automatic process to the total assigned, and recall giving the ratio of correct terms automatically assigned to the total number of term assignments according to a human standard. Results indicate that the technique can be applied successfully, especially for texts that employ a technical vocabulary and where there is a premium on indexing exhaustivity.  相似文献   

17.
基于BSC的ERP系统实施绩效评价研究   总被引:1,自引:0,他引:1  
通过对传统ERP绩效评价理论和方法的对比分析,提出基于BSC的ERP系统实施绩效评价方法.在对企业战略目标与ERP系统目标的因果关系分析基础之上,建立基于BSC的ERP系统绩效评价模型,并为ERP平衡计分卡建立关键绩效指标体系,希望能对ERP系统的实施效果从业务流程、客户、财务以及学习与创新四个角度进行全面而又综合的评价.  相似文献   

18.
方萍 《科教文汇》2012,(26):113-116
知识产权英语新词、新术语和惯用语的研究是有关知识产权文本以及知识产权领域内的典型案例中所运用的新出现的或未经汇编的专有词汇、术语和惯用语的研究,是法律英语语言研究的一个分支。本文介绍了一些知识产权领域出现的新词、新术语和惯用语,分析了它们的来源、定义、相关新理论和翻译,并特别说明了专利文献的词汇、术语和惯用语。  相似文献   

19.
20.
创造性地提出精益管理评价体系架构,建立一整套评价企业精益管理绩效水平的指标体系,通过在S汽车工业公司的实际应用分析,验证了评价体系的有效性和适用性。  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号