首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 203 毫秒
1.
文章提出了一种基于本体和设计情景的产品设计领域知识问答系统解决方案,利用本体表示产品设计知识库,提出了基于问句语义特征匹配的问题分类算法以及本体查询转换技术,并充分考虑问句中带有设计情景的复杂问题,提出情景相似度和问句相似度综合加权的设计情景问题相似度算法,最后设计实现了身管设计知识问答系统,并对实验结果进行分析验证方案的可行性。  相似文献   

2.
张瑾 《情报科学》2013,(8):71-76
基于《中图法》的语义本体相似度计算,是结合《中图法》内容和结构体系,利用语义逻辑关系等手段,进行语义相似度计算,而建立的推理规则能较好地体现词语之间的语义关系,提高了词语相似度的计算精度。  相似文献   

3.
廖开际  杨彬彬 《情报杂志》2012,31(7):182-186
基于词频统计思想的传统文本相似度算法,往往只考虑特征项在文本中的权重,而忽视了特征项之间的语义关系.综合考虑了特征项在文本中的重要程度以及特征项之间的语义关系,提出构建文本特征项的加权语义网模型来计算文本之间的相似度,并在模型构建的过程中,对特征项的选取、权值计算做了适当的改进.最后用实验验证了基于加权语义网的文本相似度算法相较于传统的算法,相似度计算的精确度有了进一步的提高.  相似文献   

4.
本文提出了一种基于WordNet的概念格间语义相似度计算方法,该方法利用WordNet中各词汇之间的结构关系,参考其各词汇间的距离、密度、深度来计算各概念间的语义相似度来反映概念间的语义关系,然后根据概念与其他的概念格中的语义相似度来计算两个概念格间的语义相似度,为以后的研究做准备。但是这种方法尚不完善,需进一步进行扩展与改进。  相似文献   

5.
盛秋艳 《情报科学》2012,(8):1238-1241
本体技术作为一种能在语义和知识层次上描述概念体系的有效工具,给词语间相似度计算带来了新的机会。词语相似度的研究,是知识表示以及信息检索领域中的一个重要内容。本文利用本体来组织概念,计算概念之间的语义相似度,将语义相似度分成概念相似度和描述相似度,把概念相似度和描述相似度进行合并,生成最终的语义相似度。依据《中国分类主题词表》建立的计算机领域本体,验证了语义相似度计算方法的有效性。  相似文献   

6.
介绍了当前国内外有关词汇语义相似度算法的研究现状,分析并对比了几种具有代表性的计算方法,并将几种常用的词汇语义相似度算法应用于FAQ中,分别采用准确率、召回率、F值以及MRR、MAP5个指标进行评价,根据相似问句的检索效果判断各词语相似度算法的优劣。  相似文献   

7.
基于语义向量空间模型的文档检索系统研究   总被引:1,自引:0,他引:1  
针对向量空间模型中因义相似度,建立了语义向量空间模型,并设计了基于语义向量空间模型的文档检索系统,重点研究了其中语义相似度计算和查询扩展两个核心技术,并通过实例验证了该检索系统的有效性.  相似文献   

8.
需要对语义主题树特征进行聚类算法设计,提高对语义特征的搜索和语义泛化能力。传统的语义特征聚类算法采用基于本体映射的语义特征聚类算法,建立异构的本体模型之间的语义等价映射关系,导致聚类性能和语义泛化能力不好。提出一种基于语义覆盖度融合的I/O映射聚类算法,利用领域知识和模式匹配,建立本体之间语义映射关系,考虑三种与语义信息相关的学习知识,得到语义相似度函数,利用统计TF-IDF的方法计算词语的特征权值,通过语义主题树特征匹配,实现搜索引擎的覆盖度I/O映射聚类改进。仿真实验表明,采用该算法能提高对语义的覆盖度融合能力,具有更好的数据聚类性能,较好地完成语义映射任务,语义信息检索查准率提高为98.7%。  相似文献   

9.
基于内容的非结构化P2P搜索系统中直接影响查询效果和搜索成本的两个主要问题是,高维语义空间所引起的文本相似度计算复杂以及广播算法带来的大量冗余消息. 本文提出利用集合差异度实现基于内容聚类的P2P搜索模型提高查询效率和减少冗余消息。该模型利用集合差异度定义文本相似度,将文本相似性的计算复杂度控制在线性时间内而有效地减少了查询时间;利用节点之间的集合差异度实现基于内容的聚类,既降低了查询时间,又减少了冗余消息.模拟实验表明,利用集合差异度构建的基于内容的搜索模型不仅具有较高的召回率,而且将搜索成本和查询时间分别降低到了Gnutella系统的40%和30%左右.  相似文献   

10.
李慧 《现代情报》2015,35(4):172-177
词语相似度计算方法在信息检索、词义消歧、机器翻译等自然语言处理领域有着广泛的应用。现有的词语相似度算法主要分为基于统计和基于语义资源两类方法,前者是从大规模的语料中统计与词语共现的上下文信息以计算其相似度,而后者利用人工构建的语义词典或语义网络计算相似度。本文比较分析了两类词语相似度算法,重点介绍了基于Web语料库和基于维基百科的算法,并总结了各自的特点和不足之处。最后提出,在信息技术的影响下,基于维基百科和基于混合技术的词语相似度算法以及关联数据驱动的相似性计算具有潜在的发展趋势。  相似文献   

11.
A main challenge in Cross-Language Information Retrieval (CLIR) is to estimate a proper translation model from available translation resources, since translation quality directly affects the retrieval performance. Among different translation resources, we focus on obtaining translation models from comparable corpora, because they provide appropriate translations for both languages and domains with limited linguistic resources. In this paper, we employ a two-step approach to build an effective translation model from comparable corpora, without requiring any additional linguistic resources, for the CLIR task. In the first step, translations are extracted by deriving correlations between source–target word pairs. These correlations are used to estimate word translation probabilities in the second step. We propose a language modeling approach for the first step, where modeling based on probability distribution provides two key advantages. First, our approach can be tuned easier in comparison with heuristically adjusted previous work. Second, it provides a principled basis for integrating additional lexical and translational relations to improve the accuracy of translations from comparable corpora. As an indication, we integrate monolingual relations of word co-occurrences into the process of translation extraction, which helps to extract more reliable translations for low-frequency words in a comparable corpus. Experimental results on an English–Persian comparable corpus show that our method outperforms the previous approaches in terms of both translation quality and the performance of CLIR. Indeed, the proposed method is naturally applicable to any comparable corpus, regardless of its languages. In addition, we demonstrate the significant impact of word translation probabilities, estimated in the second step of our approach, on the performance of CLIR.  相似文献   

12.
With the popularity of online educational platforms, English learners can learn and practice no matter where they are and what they do. English grammar is one of the important components in learning English. To learn English grammar effectively, it requires students to practice questions containing focused grammar knowledge. In this paper, we study a novel problem of retrieving English grammar questions with similar grammatical focus. Since the grammatical focus similarity is different from textual similarity or sentence syntactic similarity, existing approaches cannot be applied directly to our problem. To address this problem, we propose a syntactic based approach for English grammar question retrieval which can retrieve related grammar questions with similar grammatical focus effectively. In the proposed syntactic based approach, we first propose a new syntactic tree, namely parse-key tree, to capture English grammar questions’ grammatical focus. Next, we propose two kernel functions, namely relaxed tree kernel and part-of-speech order kernel, to compute the similarity between two parse-key trees of the query and grammar questions in the collection. Then, the retrieved grammar questions are ranked according to the similarity between the parse-key trees. In addition, if a query is submitted together with answer choices, conceptual similarity and textual similarity are also incorporated to further improve the retrieval accuracy. The performance results have shown that our proposed approach outperforms the state-of-the-art methods based on statistical analysis and syntactic analysis.  相似文献   

13.
In this paper, we present a comparison of collocation-based similarity measures: Jaccard, Dice and Cosine similarity measures for the proper selection of additional search terms in query expansion. In addition, we consider two more similarity measures: average conditional probability (ACP) and normalized mutual information (NMI). ACP is the mean value of two conditional probabilities between a query term and an additional search term. NMI is a normalized value of the two terms' mutual information. All these similarity measures are the functions of any two terms' frequencies and the collocation frequency, but are different in the methods of measurement. The selected measure changes the order of additional search terms and their weights, hence has a strong influence on the retrieval performance. In our experiments of query expansion using these five similarity measures, the additional search terms of Jaccard, Dice and Cosine similarity measures include more frequent terms with lower similarity values than ACP or NMI. In overall assessments of query expansion, the Jaccard, Dice and Cosine similarity measures are better than ACP and NMI in terms of retrieval effectiveness, whereas, NMI and ACP are better in terms of execution efficiency.  相似文献   

14.
While image-to-image translation has been extensively studied, there are a number of limitations in existing methods designed for transformation between instances of different shapes from different domains. In this paper, a novel approach was proposed (hereafter referred to as ObjectVariedGAN) to handle geometric translation. One may encounter large and significant shape changes during image-to-image translation, especially object transfiguration. Thus, we focus on synthesizing the desired results to maintain the shape of the foreground object without requiring paired training data. Specifically, our proposed approach learns the mapping between source domains and target domains, where the shapes of objects differ significantly. Feature similarity loss is introduced to encourage generative adversarial networks (GANs) to obtain the structure attribute of objects (e.g., object segmentation masks). Additionally, to satisfy the requirement of utilizing unaligned datasets, cycle-consistency loss is combined with context-preserving loss. Our approach feeds the generator with source image(s), incorporated with the instance segmentation mask, and guides the network to generate the desired target domain output. To verify the effectiveness of proposed approach, extensive experiments are conducted on pre-processed examples from the MS-COCO datasets. A comparative summary of the findings demonstrates that ObjectVariedGAN outperforms other competing approaches, in the terms of Inception Score, Frechet Inception Distance, and human cognitive preference.  相似文献   

15.
Wikipedia provides a huge collaboratively made semi-structured taxonomy called Wikipedia category graph (WCG), which can be utilized as a Knowledge Graph (KG) to measure the semantic similarity (SS) between Wikipedia concepts. Previously, several Most Informative Common Ancestor-based (MICA-based) SS methods have been proposed by intrinsically manipulating the taxonomic structure of WCG. However, some basic structural issues in WCG such as huge size, branching factor and multiple inheritance relations hamper the applicability of traditional MICA-based and multiple inheritance-based approaches in it. Therefore, in this paper, we propose a solution to handle these structural issues and present a new multiple inheritance-based SS approach, called Neighborhood Ancestor Semantic Contribution (NASC). In this approach, firstly, we define the neighborhood of a category (a taxonomic concept in WCG) to define its semantic space. Secondly, we describe the semantic value of a category by aggregating the intrinsic IC-based semantic contribution weights of its semantically relevant multiple ancestors. Thirdly, based on our approach, we propose six different methods to compute the SS between Wikipedia concepts. Finally, we evaluate our methods on gold standard word similarity benchmarks for English, German, Spanish and French languages. The experimental evaluation demonstrates that the proposed NASC-based methods remarkably outperform traditional MICA-based and multiple inheritance-based approaches.  相似文献   

16.
The task of answering complex questions requires inferencing and synthesizing information from multiple documents that can be seen as a kind of topic-oriented, informative multi-document summarization. In generic summarization the stochastic, graph-based random walk method to compute the relative importance of textual units (i.e. sentences) is proved to be very successful. However, the major limitation of the TF*IDF approach is that it only retains the frequency of the words and does not take into account the sequence, syntactic and semantic information. This paper presents the impact of syntactic and semantic information in the graph-based random walk method for answering complex questions. Initially, we apply tree kernel functions to perform the similarity measures between sentences in the random walk framework. Then, we extend our work further to incorporate the Extended String Subsequence Kernel (ESSK) to perform the task in a similar manner. Experimental results show the effectiveness of the use of kernels to include the syntactic and semantic information for this task.  相似文献   

17.
Measuring the similarity between the semantic relations that exist between words is an important step in numerous tasks in natural language processing such as answering word analogy questions, classifying compound nouns, and word sense disambiguation. Given two word pairs (AB) and (CD), we propose a method to measure the relational similarity between the semantic relations that exist between the two words in each word pair. Typically, a high degree of relational similarity can be observed between proportional analogies (i.e. analogies that exist among the four words, A is to B such as C is to D). We describe eight different types of relational symmetries that are frequently observed in proportional analogies and use those symmetries to robustly and accurately estimate the relational similarity between two given word pairs. We use automatically extracted lexical-syntactic patterns to represent the semantic relations that exist between two words and then match those patterns in Web search engine snippets to find candidate words that form proportional analogies with the original word pair. We define eight types of relational symmetries for proportional analogies and use those as features in a supervised learning approach. We evaluate the proposed method using the Scholastic Aptitude Test (SAT) word analogy benchmark dataset. Our experimental results show that the proposed method can accurately measure relational similarity between word pairs by exploiting the symmetries that exist in proportional analogies. The proposed method achieves an SAT score of 49.2% on the benchmark dataset, which is comparable to the best results reported on this dataset.  相似文献   

18.
Question categorization, which suggests one of a set of predefined categories to a user’s question according to the question’s topic or content, is a useful technique in user-interactive question answering systems. In this paper, we propose an automatic method for question categorization in a user-interactive question answering system. This method includes four steps: feature space construction, topic-wise words identification and weighting, semantic mapping, and similarity calculation. We firstly construct the feature space based on all accumulated questions and calculate the feature vector of each predefined category which contains certain accumulated questions. When a new question is posted, the semantic pattern of the question is used to identify and weigh the important words of the question. After that, the question is semantically mapped into the constructed feature space to enrich its representation. Finally, the similarity between the question and each category is calculated based on their feature vectors. The category with the highest similarity is assigned to the question. The experimental results show that our proposed method achieves good categorization precision and outperforms the traditional categorization methods on the selected test questions.  相似文献   

19.
This article describes a framework for cross-language information retrieval that efficiently leverages statistical estimation of translation probabilities. The framework provides a unified perspective into which some earlier work on techniques for cross-language information retrieval based on translation probabilities can be cast. Modeling synonymy and filtering translation probabilities using bidirectional evidence are shown to yield a balance between retrieval effectiveness and query-time (or indexing-time) efficiency that seems well suited large-scale applications. Evaluations with six test collections show consistent improvements over strong baselines.  相似文献   

20.
In the KL divergence framework, the extended language modeling approach has a critical problem of estimating a query model, which is the probabilistic model that encodes the user’s information need. For query expansion in initial retrieval, the translation model had been proposed to involve term co-occurrence statistics. However, the translation model was difficult to apply, because the term co-occurrence statistics must be constructed in the offline time. Especially in a large collection, constructing such a large matrix of term co-occurrences statistics prohibitively increases time and space complexity. In addition, reliable retrieval performance cannot be guaranteed because the translation model may comprise noisy non-topical terms in documents. To resolve these problems, this paper investigates an effective method to construct co-occurrence statistics and eliminate noisy terms by employing a parsimonious translation model. The parsimonious translation model is a compact version of a translation model that can reduce the number of terms containing non-zero probabilities by eliminating non-topical terms in documents. Through experimentation on seven different test collections, we show that the query model estimated from the parsimonious translation model significantly outperforms not only the baseline language modeling, but also the non-parsimonious models.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号