首页 | 本学科首页   官方微博 | 高级检索  
相似文献
 共查询到20条相似文献,搜索用时 171 毫秒
1.
本文阐述了一种基于特征词聚类的降维方式,其主要思想就是把词在文本中的出现看成一个事件,先通过搜索算法计算每一个特征词的分布,合并对分类有相似作用的特征词,从而起到了特征降维的作用。最后通过实验测试分析,提出了一种改进的、考虑全局簇信息的相似度计算公式,将其应用到文本分类中,实验表明提高了文本分类的精度。  相似文献   

2.
多类支持向量机在实际应用领域(比如ORC,人脸识别等)是一个非常重要的问题。广泛应用的多类SVM方法包括:一对一,一对多和DAG等。众多实验表明一对一方法通常具有较高分类准确率,但传统一对一方法测试时间较长限制了其在大数据量识别任务中的应用。提出一种改进的一对一多类支持向量机,先通过粗分类快速选出候选类别,再对候选类别按原一对一方法进行投票。实验结果显示该方法不仅提高了分类效率,而且在一定程度上提高了分类准确率。  相似文献   

3.
传统特征选择算法没有考虑特征之间的关联性,并且基于类别平衡假设,在不平衡问题上偏向多数类而忽略少数类。针对以上不足,本文综合考虑特征相关性与不平衡性,提出一种基于类区分度的高维不平衡特征选择算法CDHI,该算法通过k-means进行特征聚类,并计算簇中每个特征的类区分度,利用类区分度对聚类簇中特征进行重要性排序,然后选择各簇中类区分度较高的特征组成特征子集,达到去除高维特征冗余与处理不平衡数据的双重目的。实验结果表明,与传统特征选择方法相比,CDHI算法有效降低了特征空间的维度,提高了少数类的识别率。  相似文献   

4.
数据类间分布不均衡是不平衡数据集分类效果不好的主要原因,为了克服类间分布的不均衡,本文提出了一种基于邻近样本类别判断的不平衡数据分类算法。首先,对待判定样本,计算它的k个最邻近样本,然后将待判定样本的类别指派到它的k个最邻近中的多数类。由于本文所提出的不平衡数据分类算法在类别决策时,只考虑少量的邻近样本的类别,而不是考虑所有的训练样本,因此可以较好地克服类间不平衡对少数类分类结果的影响。在客户流失数据集上的仿真实验充分证明了本文算法能较好地处理不平衡数据分类问题。  相似文献   

5.
在传统聚类中,各特征权重或均相同或需由专家给出.并在各分类中同等使用。针对特征权重在聚类中的重要性,突出各维特征对聚类的不同影响.为此提出一种权重自动生成.在动态聚类过程中得以优化,并且各子集中特征权重互不相同。通过IRIS真实测试实验,说明使用此特征加权聚类会提高聚类精确度。  相似文献   

6.
提出了一种特征选择和特征抽取相结合的特征降维方法.首先使用改进的k-means聚类算法对特征进行选择,然后使用SVD方法在基于语义层面上对特征空间进行压缩,试验结果表明,这种特征降维模式在文本分类的准确性方面效果较好.  相似文献   

7.
基于优化初始类中心点的K-means改进算法   总被引:2,自引:0,他引:2  
K-means算法是一种重要的聚类算法,在网络信息处理领域有着广泛的应用。由于K-means算法终止于一个局部最优状态,所以初始类中心点的选择会在很大程度上影响其聚类效果。本文提出了一种K-means算法的改进算法,首先探测数据集中的相对密集区域,再利用这些密集区域生成初始类中心点。该方法能够很好地排除类边缘点和噪声点的影响,并且能够适应数据集中各个实际类别密度分布不平衡的情况,最终获得较好的聚类效果。  相似文献   

8.
模式识别是人类的一项基本智能,同时它也是一门主要利用统计学、概率论、计算几何、机器学习、信号处理以及算法的设计等工具从可感知的数据中进行推理的学科。它与统计学、心理学、语言学、计算机科学、生物学、控制论等都有关系,它与人工智能、图像处理的研究有交叉关系。模式识别的分类问题是根据识别对象特征的观察值将其分到某个类别中去。统计决策理论是处理模式分类问题的基本理论之一,它对模式分析和分类器的设计有着实际的指导意义。贝叶斯(Bayes)决策理论方法是统计模式识别中的一个基本方法,用这个方法进行分类时要求:a.各类别总体的概率分布是已知的;b.要决策分类的类别数是一定的。在连续情况下,假设对要识别的物理对象有d种特征观察量,这些特征的所有可能的取值范围构成了d维特征向量。这些假设说明了要研究的问题有c个类别,各类别状态用来表示,i=1,2...,c;对应于各个类别出现的先验概率P()及类条件概率密度函数是已知的。如果在特征空间已观察到某一向量,就是d维特征空间上的某一个点,那么应如何把分类,就是本文所要讨论的问题。  相似文献   

9.
武同宝  袁海燕  黄尊志  陈志伟 《科技通报》2019,35(7):143-146,151
针对传统特征映射方法存在映射时间长、高维数据转换率低等问题,提出基于最小熵的高维电力数据可视化特征映射方法。对高维电力数据进行空间模拟,从数据预处理、转换、离散化分析和特征分类方面入手,完成对高维电力数据可视化特征分类。建立电力数据类的散布矩阵,根据矩阵计算高维电力数据的特征相对值和判别值,完成数据特征提取。基于上述特征分类和特征提取结果,利用熵对高维电力数据各类的可分性进行描述,选取出熵最小的数据特征,定义数据的熵并将熵当作数据类别的可分性判据,利用电力数据的总体熵实现高维数据到低维数据的映射。实验结果表明,所提方法的特征数据分类准确度较高,且平均高维数据转换率为78%左右,映射耗时短,远远优于传统方法,验证了所提方法的优越性能。  相似文献   

10.
面对电力系统中海量的多维数据,传统的可视化数据挖掘无法满足空间数据处理的需要,多维数据可视化也不利于用户获取知识。因此提出了基于SOM(自组织特征映射网络)聚类的电网可视化数据挖掘新模型VSDMmodel,模型利用改进的SOM聚类算法对高维电网数据进行降维,提出一种基于颜色映射的可视化方法,对聚类结果进行低维展现,加快了用户对挖掘结果的理解,并且允许用户对结果中感兴趣的区域加以深入分析,实现对电力系统海量数据的可视化挖掘。  相似文献   

11.
Text documents usually contain high dimensional non-discriminative (irrelevant and noisy) terms which lead to steep computational costs and poor learning performance of text classification. One of the effective solutions for this problem is feature selection which aims to identify discriminative terms from text data. This paper proposes a method termed “Hebb rule based feature selection (HRFS)”. HRFS is based on supervised Hebb rule and assumes that terms and classes are neurons and select terms under the assumption that a term is discriminative if it keeps “exciting” the corresponding classes. This assumption can be explained as “a term is highly correlated with a class if it is able to keep “exciting” the class according to the original Hebb postulate. Six benchmarking datasets are used to compare HRFS with other seven feature selection methods. Experimental results indicate that HRFS is effective to achieve better performance than the compared methods. HRFS can identify discriminative terms in the view of synapse between neurons. Moreover, HRFS is also efficient because it can be described in the view of matrix operation to decrease complexity of feature selection.  相似文献   

12.
Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive.  相似文献   

13.
Zero-shot object classification aims to recognize the object of unseen classes whose supervised data are unavailable in the training stage. Recent zero-shot learning (ZSL) methods usually propose to generate new supervised data for unseen classes by designing various deep generative networks. In this paper, we propose an end-to-end deep generative ZSL approach that trains the data generation module and object classification module jointly, rather than separately as in the majority of existing generation-based ZSL methods. Due to the ZSL assumption that unseen data are unavailable in the training stage, the distribution of generated unseen data will shift to the distribution of seen data, and subsequently causes the projection domain shift problem. Therefore, we further design a novel meta-learning optimization model to improve the proposed generation-based ZSL approach, where the parameters initialization and the parameters update algorithm are meta-learned to assist model convergence. We evaluate the proposed approach on five standard ZSL datasets. The average accuracy increased by the proposed jointly training strategy is 2.7% and 23.0% for the standard ZSL task and generalized ZSL task respectively, and the meta-learning optimization further improves the accuracy by 5.0% and 2.1% on two ZSL tasks respectively. Experimental results demonstrate that the proposed approach has significant superiority in various ZSL tasks.  相似文献   

14.
Deep hashing has been an important research topic for using deep learning to boost performance of hash learning. Most existing deep supervised hashing methods mainly focus on how to effectively preserve the similarity in hash coding solely depending on pairwise supervision. However, such pairwise similarity-preserving strategy cannot fully explore the semantic information in most cases, which results in information loss. To address this problem, this paper proposes a discriminative dual-stream deep hashing (DDDH) method, which integrates the pairwise similarity loss and the classification loss into a unified framework to take full advantage of label information. Specifically, the pairwise similarity loss aims to preserve the similarity and structural information of high-dimensional original data. Meanwhile, the designed classification loss can enlarge the margin between different classes which improves the discrimination of learned binary codes. Moreover, an effective optimization algorithm is employed to train the hash code learning framework in an end-to-end manner. The results of extensive experiments on three image datasets demonstrate that our method is superior to several state-of-the-art deep and non-deep hashing methods. Ablation studies and analysis further show the effectiveness of introducing the classification loss in the overall hash learning framework.  相似文献   

15.
在许多情况下,并非所有类别都有着同样的重要性和实际意义,我们希望在特征提取时能体现出类间重要性和意义上的差异,而传统特征提取方法缺乏解决这一问题的能力.为此,利用一种基于主观引导特征提取法对光谱数据进行特征提取和分类,其主要思想是:将人的主观意图融合到特征提取中,提取有利于具有优先权类别的分类特征.结果表明,与传统线性鉴别分析法(LDA)相比较,该方法可获得更加有效的分类特征.  相似文献   

16.
Recognition of handwritten Arabic alphabet via hand motion tracking   总被引:1,自引:0,他引:1  
This paper proposes an online video-based approach to handwritten Arabic alphabet recognition. Various temporal and spatial feature extraction techniques are introduced. The motion information of the hand movement is projected onto two static accumulated difference images according to the motion directionality. The temporal analysis is followed by two-dimensional discrete cosine transform and Zonal coding or Radon transformation and low pass filtering. The resulting feature vectors are time-independent thus can be classified by a simple classification technique such as K Nearest Neighbor (KNN). The solution is further enhanced by introducing the notion of superclasses where similar classes are grouped together for the purpose of multiresolutional classification. Experimental results indicate an impressive 99% recognition rate on user-dependant mode. To validate the proposed technique, we have conducted a series of experiments using Hidden Markov models (HMM), which is the classical way of classifying data with temporal dependencies. Experimental results revealed that the proposed feature extraction scheme combined with simple KNN yields superior results to those obtained by the classical HMM-based scheme.  相似文献   

17.
A proposed particle swarm classifier has been integrated with the concept of intelligently controlling the search process of PSO to develop an efficient swarm intelligence based classifier, which is called intelligent particle swarm classifier (IPS-classifier). This classifier is described to find the decision hyperplanes to classify patterns of different classes in the feature space. An intelligent fuzzy controller is designed to improve the performance and efficiency of the proposed classifier by adapting three important parameters of PSO (inertia weight, cognitive parameter and social parameter). Three pattern recognition problems with different feature vector dimensions are used to demonstrate the effectiveness of the introduced classifier: Iris data classification, Wine data classification and radar targets classification from backscattered signals. The experimental results show that the performance of the IPS-classifier is comparable to or better than the k-nearest neighbor (k-NN) and multi-layer perceptron (MLP) classifiers, which are two conventional classifiers.  相似文献   

18.
农作物遥感分类特征变量选择研究现状与展望   总被引:5,自引:0,他引:5  
贾坤  李强子 《资源科学》2013,35(12):2507-2516
农作物遥感分类是农作物种植面积估算的重要核心问题,是提高农作物种植面积估算精度的关键研究内容。特征变量的选择是农作物遥感分类的重要步骤,有效地使用多种特征变量是提高农作物遥感分类精度的关键。随着多源数据获取的更加容易,电磁波谱特征、空间特征、时间特征以及辅助数据特征在农作物遥感分类中发挥着重要的作用。本文简要回顾和综合分析了在农作物遥感分类中所使用的各种特征变量,包括多光谱特征、微波散射特征、多源数据特征、高光谱数据特征等电磁波谱特征,以及空间特征、时间特征和辅助数据特征等,并分析了农作物遥感分类特征变量选择方面存在的问题和发展趋势。指出目前农作物遥感分类特征变量选择存在的关键问题主要包括特征变量选择的理论研究不足和综合应用存在缺陷两个方面。未来农作物遥感分类特征选择研究的核心内容主要包括生化组分特征及冠层结构特征等农作物遥感分类新特征变量的挖掘、分类特征变量的综合应用、农作物遥感分类特征变量的敏感性和不确定性研究3个方面。  相似文献   

19.
徐晓娜  翁钢民 《软科学》2007,21(1):14-16,21
基于人工神经网络(ANN)中自组织特征映射神经网络(Kohonen)的聚类功能,提取7个反映旅游需求发展情况的特征指标,对我国城市居民的旅游需求进行分类,将39个城市分为6类。对分类结果进行了分析,对方法进行了讨论,指出Kohonen网络可以避免传统聚类方法难以克服的一些缺点,是一种具有强大的自学习功能、良好的自组织性和自适应性、能迅速客观地得到聚类结果的聚类方法。  相似文献   

20.
In recent years, Zero-shot Node Classification (ZNC), an emerging and more difficult task is starting to attract attention, where the classes of testing nodes are unobserved in the training stage. Existing studies for ZNC mainly utilize Graph Neural Networks (GNNs) to construct the feature subspace to align with the classes’ semantic subspace, thus enabling knowledge transfer from seen classes to unseen classes. However, the modeling of the node feature is single-view and unilateral, e.g., the bag-of-words vector, which is not enough to fully describe the characteristics of the node itself. To address this dilemma, we propose to develop the Multi-View Enhanced zero-shot node classification paradigm (MVE) to promote the machine’s generality to approach the human-like thinking mode. Specifically, multi-view features are obtained from different aspects such as pre-trained model embeddings, knowledge graphs, statistic methods, and then fused by a contrastive learning module into the compositional node representation. Meanwhile, a developed Graph Convolutional Network (GCN) is used to make the nodes fully absorb the information of neighbors while the over-smooth issue is alleviated by multi-view features and the proposed contrastive learning mechanism. Experimental results conducted on three public datasets show an average 25% improvement compared to baseline methods, proving the superiority of our multi-view learning framework. The code and data can be found at https://github.com/guaiqihen/MVE.  相似文献   

设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号