首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 142 毫秒
1.
文本自动聚类技术研究   总被引:1,自引:0,他引:1  
自动聚类作为一种自动化程度较高的无监督机器学习技术,在信息检索和数据挖掘领域得到了广泛的应用.探讨了文本聚类的定义和步骤,依据文本自动聚类的步骤分别对文本的处理、自动聚类算法以及文本聚类结果的评价进行了阐述.  相似文献   

2.
较为系统的综述了当前空间聚类算法的相关研究。依据这些算法的特点,将它们归纳为两类:划分聚类算法、层次聚类算法。针对划分聚类算法,重点分析了PAM、CLARA和CLARANS算法。针对层次聚类算法,重点分析了BIRCH、CURE算法。比较了这些算法的复杂度,并介绍了相关应用。  相似文献   

3.
基于优化初始类中心点的K-means改进算法   总被引:2,自引:0,他引:2  
K-means算法是一种重要的聚类算法,在网络信息处理领域有着广泛的应用。由于K-means算法终止于一个局部最优状态,所以初始类中心点的选择会在很大程度上影响其聚类效果。本文提出了一种K-means算法的改进算法,首先探测数据集中的相对密集区域,再利用这些密集区域生成初始类中心点。该方法能够很好地排除类边缘点和噪声点的影响,并且能够适应数据集中各个实际类别密度分布不平衡的情况,最终获得较好的聚类效果。  相似文献   

4.
郭文娟 《科技风》2022,(4):63-65
针对传统的K-means算法运行的结果依赖于初始的聚类数目和聚类中心,本文提出了一种基于优化初始聚类中心的K-means算法.该算法通过量化样本间距离和聚类的紧密性来确定聚类数目K值;根据数据集的分布特征来选取相距较远的数据作为初始聚类中心,避免了传统K-means算法的聚类数目和聚类中心的随机选取.UCI机器学习数据...  相似文献   

5.
介绍聚类算法的过程以及聚类有效性指标的分类,分别评述科学计量学常用软件中的几种聚类算法,分析聚类算法的特性并采用基于类内紧密度和类间分离度对聚类结果的有效性进行探讨,总结各聚类算法的效果并对应软件分析的结果进行案例分析。  相似文献   

6.
付淇  黎虹  李广振 《科技广场》2010,(1):237-240
流数据挖掘技术是数据挖掘领域的新研究方向之一,而聚类研究又是其重要的内容。本文介绍了流数据基本特点,在统一流聚类表示模型的基础上,对现有流数据聚类算法进行了总结,并进一步提出了流数据聚类技术的研究方向和前景。  相似文献   

7.
文本聚类算法的质量评价   总被引:4,自引:0,他引:4  
文本聚类是建立大规模文本集合的分类体系实例的有效手段之一。本文讨论了利用标准的分类测试集合进行聚类质量的量化评价的手段,选择了k-Means聚类算法、STC(后缀树聚类)算法和基于Ant的聚类算法进行了实验对比。对实验结果的分析表明,STC聚类算法由于在处理文本时充分考虑了文本的短语特性,其聚类效果较好;基于Ant的聚类算法的结果受参数输入的影响较大;在Ant聚类算法中引入文本特性可以提高聚类结果的质量。  相似文献   

8.
基于文本聚类与LDA相融合的微博主题检索模型研究   总被引:1,自引:0,他引:1  
伴随着微博的日趋流行,对微博信息的检索逐渐成为人们获取第一消息的手段.其中文本聚类和主题发现是信息检索领域的有效方法,采用适当的方法是影响微博短文本信息检索质量的关键因素.文章针对文本聚类和LDA主题模型的互补特征,综合考虑了微博特殊文体和短文本聚类效率问题,提出了基于频繁词集的文本聚类和基于类簇的LDA主题挖掘相融合的微博检索方法,给出了针对微博文体的一种新的主题检索模型.实验表明,该方法不仅能有效地划分微博文本,并且能清晰地挖掘类簇中潜在主题.  相似文献   

9.
数据挖掘是一门面向应用的新兴学科分支,它涵盖了众多领域的知识,是解决从大量信息中获取有用知识、提供决策支持的有效途径,具有广泛的应用前景,聚类是数据挖掘中用来发现数据分布和隐含模式的一项重要技术。本文总结了大部分常用聚类算法的主要特点,对一些经典聚类算法进行比较并总结。  相似文献   

10.
郭伟光  汪本强  杨学春 《情报杂志》2015,(2):159-163,158
针对社会化标签语义模糊,传统K-medoids聚类算法对初始聚类中心敏感、收敛速度缓慢、只能将归类对象划入到单一类别的缺点,提出一种基于改进K-medoids的社会化标注资源两阶段聚类算法。算法应用一种简洁快速的初始聚类中心选取新规则以及改进的聚类准则函数,首先进行标签聚类,然后将同一标签簇中标签标注的网络资源初步划分到同一资源簇中,最后在这些资源簇中再次进行资源聚类。实验结果表明,提出的算法能自主、合理地确定初始聚类中心,聚类过程收敛速度快,聚类结果有更好的准确性。  相似文献   

11.
In this article, we investigate the use of a probabilistic model for unsupervised clustering in text collections. Unsupervised clustering has become a basic module for many intelligent text processing applications, such as information retrieval, text classification or information extraction.  相似文献   

12.
In information retrieval, cluster-based retrieval is a well-known attempt in resolving the problem of term mismatch. Clustering requires similarity information between the documents, which is difficult to calculate at a feasible time. The adaptive document clustering scheme has been investigated by researchers to resolve this problem. However, its theoretical viewpoint has not been fully discovered. In this regard, we provide a conceptual viewpoint of the adaptive document clustering based on query-based similarities, by regarding the user’s query as a concept. As a result, adaptive document clustering scheme can be viewed as an approximation of this similarity. Based on this idea, we derive three new query-based similarity measures in language modeling framework, and evaluate them in the context of cluster-based retrieval, comparing with K-means clustering and full document expansion. Evaluation result shows that retrievals based on query-based similarities significantly improve the baseline, while being comparable to other methods. This implies that the newly developed query-based similarities become feasible criterions for adaptive document clustering.  相似文献   

13.
王华秋  王重阳  聂珍 《现代情报》2016,36(2):129-134
图像聚类为数字图书馆图像管理提供了新的技术支持,能够在大量图像数据中发掘使用户感兴趣的信息。传统应用于图像聚类的特征提取算法往往忽略图像颜色的空间分布信息,且适应性较差。通过等面积矩形环对图像进行划分并计算各空间区域的相关性,并根据空间区域相关性计算各区域的重要性,将空间信息与颜色信息进行融合。同时对快速搜索密度峰值聚类算法的截断距离进行了合理改进,在保证聚类精度的同时提高收敛速度。最后将该密度聚类算法应用于数字图书馆图像检索之中。通过实验验证,本文提出的方法是可行的、有效的。  相似文献   

14.
To resolve some of lexical disagreement problems between queries and FAQs, we propose a reliable FAQ retrieval system using query log clustering. On indexing time, the proposed system clusters the logs of users’ queries into predefined FAQ categories. To increase the precision and the recall rate of clustering, the proposed system adopts a new similarity measure using a machine readable dictionary. On searching time, the proposed system calculates the similarities between users’ queries and each cluster in order to smooth FAQs. By virtue of the cluster-based retrieval technique, the proposed system could partially bridge lexical chasms between queries and FAQs. In addition, the proposed system outperforms the traditional information retrieval systems in FAQ retrieval.  相似文献   

15.
王敏  嵇绍春 《现代情报》2016,36(4):52-56
为提高图书馆个性化推荐的效果,采用模糊聚类和模糊识别技术建立数字图书馆的个性化推荐系统。通过分析用户的信息素质、兴趣爱好、网络和电子资源检索情况,对读者进行数学模糊聚类分析,确定最佳阈值λ,得到最佳聚类。根据个体用户的基本情况进行模糊识别,由识别结果的归属给出针对当前用户的个性化推荐。实验结果表明,在模糊聚类与模糊识别基础上的个性化推荐方案是可行的和有效的,为创新数字图书馆个性化服务提供了一种新的方法。  相似文献   

16.
Hierarchic document clustering has been widely applied to information retrieval (IR) on the grounds of its potential improved effectiveness over inverted file search (IFS). However, previous research has been inconclusive as to whether clustering does bring improvements. In this paper we take the view that if hierarchic clustering is applied to search results (query-specific clustering), then it has the potential to increase the retrieval effectiveness compared both to that of static clustering and of conventional IFS. We conducted a number of experiments using five document collections and four hierarchic clustering methods. Our results show that the effectiveness of query-specific clustering is indeed higher, and suggest that there is scope for its application to IR.  相似文献   

17.
【目的/意义】深度学习是近几年来人工智能领域的研究热点之一,了解深度学习在信息组织与检索方面的研究现状,能为信息组织与检索的深入研究提供参考和借鉴。【方法/内容】通过对国内基于深度学习的信息组织与检索方向的相关文献进行梳理,剖析深度学习相关模型、阐述深度学习在信息组织与检索中的研究热点主题,并结合深度学习技术的特点和信息组织与检索的研究内容,对深度学习在信息组织与检索方向的应用前景进行预测。【结果/结论】研究表明,当前深度学习在信息组织与检索中的研究热点主要集中在智能信息抽取、自动文本分类、情感分析和文本聚类这四个主题,预测未来深度学习在信息组织与检索方向会朝着对异构信息处理、智能信息检索、个性化信息推荐等方向发展。  相似文献   

18.
文本自动分类是文本信息处理中的一项基础性工作。将范例推理应用于文本分类中,并利用词语间的词共现信息从文本中抽取主题词和频繁词共现项目集,以及借助聚类算法对范例库进行索引,实现了基于范例推理的文本自动分类系统。实验表明,与基于TFIDF的文本表示方法和最近邻分类算法相比,基于词共现信息的文本表示方法和范例库的聚类索引能有效地改善分类的准确性和效率,从而拓宽了范例推理的应用领域。  相似文献   

19.
Contextual document clustering is a novel approach which uses information theoretic measures to cluster semantically related documents bound together by an implicit set of concepts or themes of narrow specificity. It facilitates cluster-based retrieval by assessing the similarity between a query and the cluster themes’ probability distribution. In this paper, we assess a relevance feedback mechanism, based on query refinement, that modifies the query’s probability distribution using a small number of documents that have been judged relevant to the query. We demonstrate that by providing only one relevance judgment, a performance improvement of 33% was obtained.  相似文献   

20.
We are interested in how ideas from document clustering can be used to improve the retrieval accuracy of ranked lists in interactive systems. In particular, we are interested in ways to evaluate the effectiveness of such systems to decide how they might best be constructed. In this study, we construct and evaluate systems that present the user with ranked lists and a visualization of inter-document similarities. We first carry out a user study to evaluate the clustering/ranked list combination on instance-oriented retrieval, the task of the TREC-6 Interactive Track. We find that although users generally prefer the combination, they are not able to use it to improve effectiveness. In the second half of this study, we develop and evaluate an approach that more directly combines the ranked list with information from inter-document similarities. Using the TREC collections and relevance judgments, we show that it is possible to realize substantial improvements in effectiveness by doing so, and that although users can use the combined information effectively, the system can provide hints that substantially improve on the user's solo effort. The resulting approach shares much in common with an interactive application of incremental relevance feedback. Throughout this study, we illustrate our work using two prototype systems constructed for these evaluations. The first, AspInQuery, is a classic information retrieval system augmented with a specialized tool for recording information about instances of relevance. The other system, Lighthouse, is a Web-based application that combines a ranked list with a portrayal of inter-document similarity. Lighthouse can work with collections such as TREC, as well as the results of Web search engines.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号