首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 31 毫秒
1.
Graph-based multi-view clustering aims to take advantage of multiple view graph information to provide clustering solutions. The consistency constraint of multiple views is the key of multi-view graph clustering. Most existing studies generate fusion graphs and constrain multi-view consistency by clustering loss. We argue that local pair-view consistency can achieve fine-modeling of consensus information in multiple views. Towards this end, we propose a novel Contrastive and Attentive Graph Learning framework for multi-view clustering (CAGL). Specifically, we design a contrastive fine-modeling in multi-view graph learning using maximizing the similarity of pair-view to guarantee the consistency of multiple views. Meanwhile, an Att-weighted refined fusion graph module based on attention networks to capture the capacity difference of different views dynamically and further facilitate the mutual reinforcement of single view and fusion view. Besides, our CAGL can learn a specialized representation for clustering via a self-training clustering module. Finally, we develop a joint optimization objective to balance every module and iteratively optimize the proposed CAGL in the framework of graph encoder–decoder. Experimental results on six benchmarks across different modalities and sizes demonstrate that our CAGL outperforms state-of-the-art baselines.  相似文献   

2.
Many problems in data mining involve datasets with multiple views where the feature space consists of multiple feature groups. Previous studies employed view weighting method to find a shared cluster structure underneath different views. However, most of these studies applied gradient optimization method to optimize the cluster centroids and feature weights iteratively and made the final partition local optimal. In this work, we proposed a novel bi-level weighted multi-view clustering method with emphasizing fuzzy weighting on both view and feature. Furthermore, an efficient global search strategy that combines particle swarm optimization and gradient optimization was proposed to solve the induced non-convex loss function. In the experimental analysis, the performance of the proposed method was compared with five state-of-the-art weighted clustering algorithms on three real-world high-dimensional multi-view datasets.  相似文献   

3.
词干化、词形还原是英文文本处理中的一个重要步骤。本文利用3种聚类算法对两个Stemming算法和一个Lemmatization算法进行较为全面的实验。结果表明,Stemming和Lemmatization都可以提高英文文本聚类的聚类效果和效率,但对聚类结果的影响并不显著。相比于Snowball Stemmer和Stanford Lemmatizer,Porter Stemmer方法在Entropy和Pu-rity表现上更好,也更为稳定。  相似文献   

4.
Deep multi-view clustering (MVC) is to mine and employ the complex relationships among views to learn the compact data clusters with deep neural networks in an unsupervised manner. The more recent deep contrastive learning (CL) methods have shown promising performance in MVC by learning cluster-oriented deep feature representations, which is realized by contrasting the positive and negative sample pairs. However, most existing deep contrastive MVC methods only focus on the one-side contrastive learning, such as feature-level or cluster-level contrast, failing to integrating the two sides together or bringing in more important aspects of contrast. Additionally, most of them work in a separate two-stage manner, i.e., first feature learning and then data clustering, failing to mutually benefit each other. To fix the above challenges, in this paper we propose a novel joint contrastive triple-learning framework to learn multi-view discriminative feature representation for deep clustering, which is threefold, i.e., feature-level alignment-oriented and commonality-oriented CL, and cluster-level consistency-oriented CL. The former two submodules aim to contrast the encoded feature representations of data samples in different feature levels, while the last contrasts the data samples in the cluster-level representations. Benefiting from the triple contrast, the more discriminative representations of views can be obtained. Meanwhile, a view weight learning module is designed to learn and exploit the quantitative complementary information across the learned discriminative features of each view. Thus, the contrastive triple-learning module, the view weight learning module and the data clustering module with these fused features are jointly performed, so that these modules are mutually beneficial. The extensive experiments on several challenging multi-view datasets show the superiority of the proposed method over many state-of-the-art methods, especially the large improvement of 15.5% and 8.1% on Caltech-4V and CCV in terms of accuracy. Due to the promising performance on visual datasets, the proposed method can be applied into many practical visual applications such as visual recognition and analysis. The source code of the proposed method is provided at https://github.com/ShizheHu/Joint-Contrastive-Triple-learning.  相似文献   

5.
Recently, the augmented complex-valued normalized subband adaptive filtering (ACNSAF) algorithm has been proposed to process colored non-circular signals. However, its performance will deteriorate severely under impulsive noise interference. To overcome this issue, a robust augmented complex-valued normalized M-estimate subband adaptive filtering (ACNMSAF) algorithm is proposed, which is obtained by modifying the subband constraints of the ACNSAF algorithm using the complex-valued modified Huber (MH) function and is derived based on CR calculus and Lagrange multipliers. In order to improve both the convergence speed and steady-state accuracy of the fixed step size ACNMSAF algorithm, a variable step size (VSS) strategy based on the minimum mean squared deviation (MSD) criterion is devised, which allocates individual adaptive step size to each subband, fully exploiting the structural advantages of SAF and significantly improving the convergence performance of the ACNMSAF algorithm as well as its tracking capability in non-stationary environment. Then, the stability, transient and steady-state MSD performance of the ACNMSAF algorithm in the presence of colored non-circular inputs and impulsive noise are analyzed, and the stability conditions, transient and steady-state MSD formulas are also derived. Computer simulations in impulsive noise environments verify the accuracy of theoretical analysis results and the effectiveness of the proposed algorithms compared to other existing complex-valued adaptive algorithms.  相似文献   

6.
A hybrid text/citation-based method is used to cluster journals covered by the Web of Science database in the period 2002–2006. The objective is to use this clustering to validate and, if possible, to improve existing journal-based subject-classification schemes. Cross-citation links are determined on an item-by-paper procedure for individual papers assigned to the corresponding journal. Text mining for the textual component is based on the same principle; textual characteristics of individual papers are attributed to the journals in which they have been published. In a first step, the 22-field subject-classification scheme of the Essential Science Indicators (ESI) is evaluated and visualised. In a second step, the hybrid clustering method is applied to classify the about 8300 journals meeting the selection criteria concerning continuity, size and impact. The hybrid method proves superior to its two components when applied separately. The choice of 22 clusters also allows a direct field-to-cluster comparison, and we substantiate that the science areas resulting from cluster analysis form a more coherent structure than the “intellectual” reference scheme, the ESI subject scheme. Moreover, the textual component of the hybrid method allows labelling the clusters using cognitive characteristics, while the citation component allows visualising the cross-citation graph and determining representative journals suggested by the PageRank algorithm. Finally, the analysis of journal ‘migration’ allows the improvement of existing classification schemes on the basis of the concordance between fields and clusters.  相似文献   

7.
In recent years, sparse subspace clustering (SSC) has been witnessed to its advantages in subspace clustering field. Generally, the SSC first learns the representation matrix of data by self-expressive, and then constructs affinity matrix based on the obtained sparse representation. Finally, the clustering result is achieved by applying spectral clustering to the affinity matrix. As described above, the existing SSC algorithms often learn the sparse representation and affinity matrix in a separate way. As a result, it may not lead to the optimum clustering result because of the independence process. To this end, we proposed a novel clustering algorithm via learning representation and affinity matrix conjointly. By the proposed method, we can learn sparse representation and affinity matrix in a unified framework, where the procedure is conducted by using the graph regularizer derived from the affinity matrix. Experimental results show the proposed method achieves better clustering results compared to other subspace clustering approaches.  相似文献   

8.
Research on clustering algorithms in synonymy graphs of a single language yields promising results, however, this idea is not yet explored in a multilingual setting. Nevertheless, moving the problem to a multilingual translation graph enables the use of more clues and techniques not possible in a monolingual synonymy graph. This article explores the potential of sense induction methods in a massively multilingual translation graph. For this purpose, the performance of graph clustering methods in synset detection are investigated. In the context of translation graphs, the use of existing Wordnets in different languages is an important clue for synset detection which cannot be utilized in a monolingual setting. Casting the problem into an unsupervised synset expansion task rather than a clustering or community detection task improves the results substantially. Furthermore, instead of a greedy unsupervised expansion algorithm guided by heuristics, we devise a supervised learning algorithm able to learn synset expansion patterns from the words in existing Wordnets to achieve superior results. As the training data is formed of already existing Wordnets, as opposed to previous work, manual labeling is not required. To evaluate our methods, Wordnets for Slovenian, Persian, German and Russian are built from scratch and compared to their manually built Wordnets or labeled test-sets. Results reveal a clear improvement over 2 state-of-the-art algorithms targeting massively multilingual Wordnets and competitive results with Wordnet construction methods targeting a single language. The system is able to produce Wordnets from scratch with a Wordnet base concept coverage ranging from 20% to 88% for 51 languages and expands existing Wordnets up to 30%.  相似文献   

9.
In this paper, we provide a new insight into clustering with a spring–mass dynamics, and propose a resulting hierarchical clustering algorithm. To realize the spectral graph partitioning as clustering, we model a weighted graph of a data set as a mass–spring dynamical system, where we regard a cluster as an oscillating single entity of a data set with similar properties. And then, we describe how oscillation modes are related with eigenvectors of a graph Laplacian matrix of the data set. In each step of the clustering, we select a group of clusters, which has the biggest number of constituent clusters. This group is divided into sub-clusters by examining an eigenvector minimizing a cost function, which is formed in such a way that subdivided clusters will be balanced with large size. To find k clusters out of non-spherical or complex data, we first transform the data into spherical clusters located on the unit sphere positioned in the (k−1)-dimensional space. In the sequel, we use the previous procedure to these transformed data. The computational experiments demonstrate that the proposed method works quite well on a variety of data sets, although its performance degrades with the degree of overlapping of data sets.  相似文献   

10.
Due to the hopeful application of gathering information from unreachable position, wireless sensor network creates an immense challenge for data routing to maximize the communication with more energy efficiency. In order to design the energy efficient routing, the optimization based clustering protocols are more preferred in wireless sensor network. In this paper, we have proposed competent optimization based algorithm called Fractional lion (FLION) clustering algorithm for creating the energy efficient routing path. Here, the proposed clustering algorithm is used to increase the energy and lifetime of the network nodes by selecting the rapid cluster head. In addition, we have proposed multi-objective FLION clustering algorithm to develop the new fitness function based on the five objectives like intra-cluster distance, inter-cluster distance, cluster head energy, normal nodes energy and delay. Here, the proposed fitness function is used to find the rapid cluster centroid for an efficient routing path. Finally, the performance of the proposed clustering algorithm is compared with the existing clustering algorithms such as low energy adaptive clustering hierarchy (LEACH), particle swarm optimization (PSO), artificial bee colony (ABC) and Fractional ABC clustering algorithm. The results proved that the lifetime of the wireless sensor nodes is maximized by the proposed FLION based multi-objective clustering algorithm as compared with existing protocols.  相似文献   

11.
Search task success rate is an important indicator to measure the performance of search engines. In contrast to most of the previous approaches that rely on labeled search tasks provided by users or third-party editors, this paper attempts to improve the performance of search task success evaluation by exploiting unlabeled search tasks that are existing in search logs as well as a small amount of labeled ones. Concretely, the Multi-view Active Semi-Supervised Search task Success Evaluation (MA4SE) approach is proposed, which exploits labeled data and unlabeled data by integrating the advantages of both semi-supervised learning and active learning with the multi-view mechanism. In the semi-supervised learning part of MA4SE, we employ a multi-view semi-supervised learning approach that utilizes different parameter configurations to achieve the disagreement between base classifiers. The base classifiers are trained separately from the pre-defined action and time views. In the active learning part of MA4SE, each classifier received from semi-supervised learning is applied to unlabeled search tasks, and the search tasks that need to be manually annotated are selected based on both the degree of disagreement between base classifiers and a regional density measurement. We evaluate the proposed approach on open datasets with two different definitions of search tasks success. The experimental results show that MA4SE outperforms the state-of-the-art semi-supervised search task success evaluation approach.  相似文献   

12.
This paper focuses on constructing a conjugate gradient-based (CGB) method to solve the generalized periodic coupled Sylvester matrix equations in complex space. The presented method is developed from a point of conjugate gradient methods. It is proved that the presented method can find the solution of the considered matrix equations within finite iteration steps in the absence of round-off errors by theoretical derivation. Some numerical examples are provided to verify the convergence performance of the presented method, which is superior to some existing numerical algorithms both in iteration steps and computation time.  相似文献   

13.
Automatic text summarization has been an active field of research for many years. Several approaches have been proposed, ranging from simple position and word-frequency methods, to learning and graph based algorithms. The advent of human-generated knowledge bases like Wikipedia offer a further possibility in text summarization – they can be used to understand the input text in terms of salient concepts from the knowledge base. In this paper, we study a novel approach that leverages Wikipedia in conjunction with graph-based ranking. Our approach is to first construct a bipartite sentence–concept graph, and then rank the input sentences using iterative updates on this graph. We consider several models for the bipartite graph, and derive convergence properties under each model. Then, we take up personalized and query-focused summarization, where the sentence ranks additionally depend on user interests and queries, respectively. Finally, we present a Wikipedia-based multi-document summarization algorithm. An important feature of the proposed algorithms is that they enable real-time incremental summarization – users can first view an initial summary, and then request additional content if interested. We evaluate the performance of our proposed summarizer using the ROUGE metric, and the results show that leveraging Wikipedia can significantly improve summary quality. We also present results from a user study, which suggests that using incremental summarization can help in better understanding news articles.  相似文献   

14.
Recently a new fashion of semi-supervised clustering algorithms, coined as constrained clustering, has emerged. These new algorithms can incorporate some a priori domain knowledge to the clustering process, allowing the user to guide the method. The vast majority of studies about the effectiveness of these approaches have been performed using information, in the form of constraints, which was totally accurate. This would be the ideal case, but such a situation will be impossible in most realistic settings, due to errors in the constraint creation process, misjudgements of the user, inconsistent information, etc. Hence, the robustness of the constrained clustering algorithms when dealing with erroneous constraints is bound to play an important role in their final effectiveness.  相似文献   

15.
文本聚类算法的质量评价   总被引:4,自引:0,他引:4  
文本聚类是建立大规模文本集合的分类体系实例的有效手段之一。本文讨论了利用标准的分类测试集合进行聚类质量的量化评价的手段,选择了k-Means聚类算法、STC(后缀树聚类)算法和基于Ant的聚类算法进行了实验对比。对实验结果的分析表明,STC聚类算法由于在处理文本时充分考虑了文本的短语特性,其聚类效果较好;基于Ant的聚类算法的结果受参数输入的影响较大;在Ant聚类算法中引入文本特性可以提高聚类结果的质量。  相似文献   

16.
Semi-supervised multi-view learning has recently achieved appealing performance with the consensus relation between samples. However, in addition to the relation between samples, the relation between samples and their assemble centroid is also important to the learning. In this paper, we propose a novel model based on orthogonal non-negative matrix factorization, which allows exploring both the consensus relations between samples and between samples and their assemble centroid. Since this model utilizes more consensus information to guide the multi-view learning, it can lead to better performance. Meanwhile, we theoretically derive a proposition about the equivalency between the partial orthogonality and the full orthogonality. Based on this proposition, the orthogonality constraint and the label constraint are simultaneously implemented in the proposed model. Experimental evaluations on five real-world datasets show that our approach outperforms the state-of-the-art methods, where the improvement is 6% average in terms of ARI index.  相似文献   

17.
We consider a challenging clustering task: the clustering of multi-word terms without document co-occurrence information in order to form coherent groups of topics. For this task, we developed a methodology taking as input multi-word terms and lexico-syntactic relations between them. Our clustering algorithm, named CPCL is implemented in the TermWatch system. We compared CPCL to other existing clustering algorithms, namely hierarchical and partitioning (k-means, k-medoids). This out-of-context clustering task led us to adapt multi-word term representation for statistical methods and also to refine an existing cluster evaluation metric, the editing distance in order to evaluate the methods. Evaluation was carried out on a list of multi-word terms from the genomic field which comes with a hand built taxonomy. Results showed that while k-means and k-medoids obtained good scores on the editing distance, they were very sensitive to term length. CPCL on the other hand obtained a better cluster homogeneity score and was less sensitive to term length. Also, CPCL showed good adaptability for handling very large and sparse matrices.  相似文献   

18.
一种新型直接寻优法   总被引:1,自引:0,他引:1  
尹贵虎  庞文尧 《科技通报》2002,18(4):289-294
提出了一种新型的直接寻优法,本算法在全局变化的随机搜索基础上,采用聚类的方法,对搜索空间进行切分,利用并行寻优机制,逐步细搜索。这种既确保了优化的质量,又使解以尽快速度收敛。具体实例表明本算法与模拟退火和遗传算法等直接寻优的算法相比,大大提高了搜索效率。  相似文献   

19.
基于关键词共现频率的热点分析方法研究   总被引:2,自引:0,他引:2  
关键词共现可以有效地反映学科领域的研究热点,为科学研究提供辅助支持。文章系统梳理基于共现频率的共词分析相关度算法、聚类算法、可视化方法等,评价现有聚类算法,并针对k-means聚类算法提出改进构想。  相似文献   

20.
With the popularity of social platforms such as Sina Weibo, Tweet, etc., a large number of public events spread rapidly on social networks and huge amount of textual data are generated along with the discussion of netizens. Social text clustering has become one of the most critical methods to help people find relevant information and provides quality data for subsequent timely public opinion analysis. Most existing neural clustering methods rely on manual labeling of training sets and take a long time in the learning process. Due to the explosiveness and the large-scale of social media data, it is a challenge for social text data clustering to satisfy the timeliness demand of users. This paper proposes a novel unsupervised event-oriented graph clustering framework (EGC), which can achieve efficient clustering performance on large-scale datasets with less time overhead and does not require any labeled data. Specifically, EGC first mines the potential relations existing in social text data and transforms the textual data of social media into an event-oriented graph by taking advantage of graph structure for complex relations representation. Secondly, EGC uses a keyword-based local importance method to accurately measure the weights of relations in event-oriented graph. Finally, a bidirectional depth-first clustering algorithm based on the interrelations is proposed to cluster the nodes in event-oriented graph. By projecting the relations of the graph into a smaller domain, EGC achieves fast convergence. The experimental results show that the clustering performance of EGC on the Weibo dataset reaches 0.926 (NMI), 0.926 (AMI), 0.866 (ARI), which are 13%–30% higher than other clustering methods. In addition, the average query time of EGC clustered data is 16.7ms, which is 90% less than the original data.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号