首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到19条相似文献,搜索用时 46 毫秒
1.
一种大规模中文搜索日志的层次聚类方法   总被引:1,自引:0,他引:1  
孙锐  金澎 《科技通报》2012,28(8):83-85
提出一种层次聚类算法,旨在对搜索引擎的查询日志数据进行聚类分析。算法基于搜狗实验室公开的查询日志数据,通过3次划分完成查询文本聚类,每一次划分实现不同程度的降维。相似度参数可根据不同的聚类需求调整,算法可扩展性强。实验结果为查询推荐、相关性排序等提供了有力的依据。  相似文献   

2.
了解用户查询意图对改善搜索引擎质量起到了至关重要的作用,对具有特定兴趣的用户进行查询分析,使搜索引擎更能了解用户的真实需求。本文通过对网络查询日志进行聚类分析,将相似度大的查询词聚类,建立用户兴趣模型对用户的兴趣进行分析。根据查询词内容重合度,建立查询词图,并结合查询词的PageRank算法,提出一种基于用户查询词概率分布的评价方法,对用户感兴趣的查询词进行评价。最后,根据查询词的概率分布将最感兴趣的查询词推荐给用户。  相似文献   

3.
梁少星 《现代情报》2015,35(8):151-156
相似度计算方法的优劣直接影响到信息检索与推荐的效果。本文根据本体图模型中属性序列的特点,综合考虑层次关系和属性关系,在分析路径关联相似度、层次相交关联相似度及属性相交关联相似度的影响因素的基础上给出了实例之间综合语义相似度的计算方法。文章最后讨论了该相似度计算方法在解决基于内容的推荐中的过于专门化问题、协同过滤推荐中的稀疏性问题以及检索中查全率和查准率问题中的应用。  相似文献   

4.
协同过滤是目前电子商务推荐系统中使用最广泛最成功的一种个性化推荐算法.受数据稀疏性影响,传统协同过滤算法在较小共同评分项集上计算出的相似度不能准确反映用户间的相似关系,严重影响了推荐系统的精度.针对该问题,在分析共同评分分布及其与相似度关系的基础上,提出了基于共同评分的协同过滤算法,无须计算相似度,直接将共同评分作为最近邻选择标准.MovieLens实验表明该算法能明显提高预测结果的准确性和覆盖率.  相似文献   

5.
提出一种基于数据倾斜关联度的数据高效算法,首次给出数据倾斜度的概念,利用数据类间特征进行数据倾斜程度的判断,对相似数据进行类间聚类,将数据库中数据特征进行信息相似度计算,计算概念总出现次数时应累加其所有子概念的出现次数。在查询的过程中,充分考虑同一数据属性的多样性,对数据特征加入模型的数量进行约束,减少不必要的繁琐数据特征。判断符合查询条件的数据类别,根据不同的数据类别进行数据查询,实现数据结构的优化。实验结果表明,利用这种算法进行数据库查询优化,能够有效提高海量数据库信息查询的效率。  相似文献   

6.
章成敏  鞠海燕 《情报杂志》2005,24(11):101-103,105
综合考虑查询串所包含关键词的词形、语义、语用三个层面的信息计算查询串相似度的计算方法。首先利用字面相似度算法计算查询串在词形上的相似度,然后利用义类词典进行关键词在语义层面上的匹配,得到查询串在语义层面上的相似度,接着以搜索引擎作为语料库来源,将查询串提交给搜索引擎,通过对返回结果中重叠部分的统计分析,计算查询串在语用上的相似度,最后综合这三个相似度,完成相似度的计算。实验结果表明该算法的有效性。  相似文献   

7.
曾群  程晓 《现代情报》2016,36(11):50-54
互联网时代,个性化推荐系统逐渐被应用到各个不同的领域,随之个性化推荐算法也成为目前研究的热点。然而,传统的推荐算法往往存在着冷启动、数据稀疏等问题。本文在对传统推荐算法研究的基础上,提出了一种基于相似传播和情景聚类的协同过滤推荐算法,根据计算用户间的情景相似度对用户进行聚类,然后根据相似传播原理找出目标用户更多的最近邻居,最后根据预测目标用户对项目的评分进行推荐。借助网上公共数据集在Matlab上实现了该算法并验证了算法的有效性。实验结果表明,本文所提算法的准确性相比传统算法有所提高,同时缓解了传统推荐算法存在的冷启动和数据稀疏性等问题。  相似文献   

8.
最近邻协同过滤常用的计算用户访问行为相似程度的距离函数仅是测定访问者对象在所有测试属性空间上的平均测定,而在属性集的子维空间上的相似模式并没有有效地挖掘出来,用户评分数据稀疏等问题使其推荐质量下降。针对这些问题,提出一种基于用户模式聚类的协同过滤推荐算法,该算法采用基于用户模式相似的子空间聚类方法产生聚类,并且利用模式相似度改进协同过滤,从而对用户产生个性化推荐。实验结果表明,该方法改善了推荐系统的效率和精度。  相似文献   

9.
马鑫  王芳 《现代情报》2023,(1):6-18
[目的/意义]基于近邻用户的协同过滤推荐作为推荐系统应用最广泛的算法之一,受数据稀疏和计算可扩展问题影响,推荐效果不尽如人意。[方法/过程]针对上述问题,提出了一种改进的推荐算法(Category Preferred Data Field Clustering Based Collaborative Filtering Recommendation, CPDFC-CFR)。首先,该算法舍弃用户评分,利用评论情感构建用户—项目矩阵,以增强用户偏好表示能力;其次,该算法引入类目偏好和语义偏好的概念,利用类目偏好比对高维用户—项目矩阵进行降维,并在用户相似度计算中纳入评论情感偏好、项目类目偏好和语义偏好,以降低数据稀疏性;最后,该算法将数据场作为用户聚类的前置算法,把数据场输出(极大值点)作为K-means算法输入,以提升算法实时性和稳定性。[结果/结论]实验结果表明:(1)项目类目级别越低,CPDFC-CFR算法准确性(F-measure)和即时性(相似度计算次数和推荐耗时)越优;(2)与其他推荐算法相比,CPDFC-CFR算法能够有效提升推荐准确性和计算效率,对协同过滤推荐系统建设具有重...  相似文献   

10.
将影响社会化推荐的三种因素分别量化,建立了与微博社会网络一一映射。然后,基于Karhunen-Loéve(KL)变换方法,计算出了同一主题下积极性和消极性文本平均距离。最后,将社会网络信息与情感相似结合形成修正的情感相似度量方法,利用修正相似度方法构建了新的社会化推荐系统。基于微博数据的实证计算和分析显示:经过变换后的用户相似度可以得到不同程度的提高;利用修正相似度方法构建的微博社会化推荐系统更符合用户心理偏好。  相似文献   

11.
One of the major problems in information retrieval is the formulation of queries on the part of the user. This entails specifying a set of words or terms that express their informational need. However, it is well-known that two people can assign different terms to refer to the same concepts. The techniques that attempt to reduce this problem as much as possible generally start from a first search, and then study how the initial query can be modified to obtain better results. In general, the construction of the new query involves expanding the terms of the initial query and recalculating the importance of each term in the expanded query. Depending on the technique used to formulate the new query several strategies are distinguished. These strategies are based on the idea that if two terms are similar (with respect to any criterion), the documents in which both terms appear frequently will also be related. The technique we used in this study is known as query expansion using similarity thesauri.  相似文献   

12.
This paper proposes an efficient and effective solution to the problem of choosing the queries to suggest to web search engine users in order to help them in rapidly satisfying their information needs. By exploiting a weak function for assessing the similarity between the current query and the knowledge base built from historical users’ sessions, we re-conduct the suggestion generation phase to the processing of a full-text query over an inverted index. The resulting query recommendation technique is very efficient and scalable, and is less affected by the data-sparsity problem than most state-of-the-art proposals. Thus, it is particularly effective in generating suggestions for rare queries occurring in the long tail of the query popularity distribution. The quality of suggestions generated is assessed by evaluating the effectiveness in forecasting the users’ behavior recorded in historical query logs, and on the basis of the results of a reproducible user study conducted on publicly-available, human-assessed data. The experimental evaluation conducted shows that our proposal remarkably outperforms two other state-of-the-art solutions, and that it can generate useful suggestions even for rare and never seen queries.  相似文献   

13.
In this paper, we describe a model of information retrieval system that is based on a document re-ranking method using document clusters. In the first step, we retrieve documents based on the inverted-file method. Next, we analyze the retrieved documents using document clusters, and re-rank them. In this step, we use static clusters and dynamic cluster view. Consequently, we can produce clusters that are tailored to characteristics of the query. We focus on the merits of the inverted-file method and cluster analysis. In other words, we retrieve documents based on the inverted-file method and analyze all terms in document based on the cluster analysis. By these two steps, we can get the retrieved results which are made by the consideration of the context of all terms in a document as well as query terms. We will show that our method achieves significant improvements over the method based on similarity search ranking alone.  相似文献   

14.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

15.
Unknown words such as proper nouns, abbreviations, and acronyms are a major obstacle in text processing. Abbreviations, in particular, are difficult to read/process because they are often domain specific. In this paper, we propose a method for automatic expansion of abbreviations by using context and character information. In previous studies dictionaries were used to search for abbreviation expansion candidates (candidates words for original form of abbreviations) to expand abbreviations. We use a corpus with few abbreviations from the same field instead of a dictionary. We calculate the adequacy of abbreviation expansion candidates based on the similarity between the context of the target abbreviation and that of its expansion candidate. The similarity is calculated using a vector space model in which each vector element consists of words surrounding the target abbreviation and those of its expansion candidate. Experiments using approximately 10,000 documents in the field of aviation showed that the accuracy of the proposed method is 10% higher than that of previously developed methods.  相似文献   

16.
Log parsing is a critical task that converts unstructured raw logs into structured data for downstream tasks. Existing methods often rely on manual string-matching rules to extract template tokens, leading to lower adaptability on different log datasets. To address this issue, we propose an automated log parsing method, PVE, which leverages Variational Auto-Encoder (VAE) to build a semi-supervised model for categorizing log tokens. Inspired by the observation that log template tokens often consist of words, we choose common words and their combinations to serve as training data to enhance the diversity of structure features of template tokens. Specifically, PVE constructs two types of embedding vectors, the sum embedding and the n-gram embedding, for each word and word combination. The structure features of template tokens can be learned by training VAE on these embeddings. PVE categorizes a token as a template token if it is similar to the training data when log parsing. To improve efficiency, we use the average similarity between token embedding and VAE samples to determine the token type, rather than the reconstruction error. Evaluations on 16 real-world log datasets demonstrate that our method has an average accuracy of 0.878, which outperforms comparison methods in terms of parsing accuracy and adaptability.  相似文献   

17.
18.
We propose an approach to the retrieval of entities that have a specific relationship with the entity given in a query. Our research goal is to investigate whether related entity finding problem can be addressed by combining a measure of relatedness of candidate answer entities to the query, and likelihood that the candidate answer entity belongs to the target entity category specified in the query. An initial list of candidate entities, extracted from top ranked documents retrieved for the query, is refined using a number of statistical and linguistic methods. The proposed method extracts the category of the target entity from the query, identifies instances of this category as seed entities, and computes similarity between candidate and seed entities. The evaluation was conducted on the Related Entity Finding task of the Entity Track of TREC 2010, as well as the QA list questions from TREC 2005 and 2006. Evaluation results demonstrate that the proposed methods are effective in finding related entities.  相似文献   

19.
With the popularity of online educational platforms, English learners can learn and practice no matter where they are and what they do. English grammar is one of the important components in learning English. To learn English grammar effectively, it requires students to practice questions containing focused grammar knowledge. In this paper, we study a novel problem of retrieving English grammar questions with similar grammatical focus. Since the grammatical focus similarity is different from textual similarity or sentence syntactic similarity, existing approaches cannot be applied directly to our problem. To address this problem, we propose a syntactic based approach for English grammar question retrieval which can retrieve related grammar questions with similar grammatical focus effectively. In the proposed syntactic based approach, we first propose a new syntactic tree, namely parse-key tree, to capture English grammar questions’ grammatical focus. Next, we propose two kernel functions, namely relaxed tree kernel and part-of-speech order kernel, to compute the similarity between two parse-key trees of the query and grammar questions in the collection. Then, the retrieved grammar questions are ranked according to the similarity between the parse-key trees. In addition, if a query is submitted together with answer choices, conceptual similarity and textual similarity are also incorporated to further improve the retrieval accuracy. The performance results have shown that our proposed approach outperforms the state-of-the-art methods based on statistical analysis and syntactic analysis.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号