共查询到20条相似文献,搜索用时 171 毫秒
1.
本文阐述了一种基于特征词聚类的降维方式,其主要思想就是把词在文本中的出现看成一个事件,先通过搜索算法计算每一个特征词的分布,合并对分类有相似作用的特征词,从而起到了特征降维的作用。最后通过实验测试分析,提出了一种改进的、考虑全局簇信息的相似度计算公式,将其应用到文本分类中,实验表明提高了文本分类的精度。 相似文献
2.
3.
传统特征选择算法没有考虑特征之间的关联性,并且基于类别平衡假设,在不平衡问题上偏向多数类而忽略少数类。针对以上不足,本文综合考虑特征相关性与不平衡性,提出一种基于类区分度的高维不平衡特征选择算法CDHI,该算法通过k-means进行特征聚类,并计算簇中每个特征的类区分度,利用类区分度对聚类簇中特征进行重要性排序,然后选择各簇中类区分度较高的特征组成特征子集,达到去除高维特征冗余与处理不平衡数据的双重目的。实验结果表明,与传统特征选择方法相比,CDHI算法有效降低了特征空间的维度,提高了少数类的识别率。 相似文献
4.
数据类间分布不均衡是不平衡数据集分类效果不好的主要原因,为了克服类间分布的不均衡,本文提出了一种基于邻近样本类别判断的不平衡数据分类算法。首先,对待判定样本,计算它的k个最邻近样本,然后将待判定样本的类别指派到它的k个最邻近中的多数类。由于本文所提出的不平衡数据分类算法在类别决策时,只考虑少量的邻近样本的类别,而不是考虑所有的训练样本,因此可以较好地克服类间不平衡对少数类分类结果的影响。在客户流失数据集上的仿真实验充分证明了本文算法能较好地处理不平衡数据分类问题。 相似文献
5.
6.
7.
基于优化初始类中心点的K-means改进算法 总被引:2,自引:0,他引:2
K-means算法是一种重要的聚类算法,在网络信息处理领域有着广泛的应用。由于K-means算法终止于一个局部最优状态,所以初始类中心点的选择会在很大程度上影响其聚类效果。本文提出了一种K-means算法的改进算法,首先探测数据集中的相对密集区域,再利用这些密集区域生成初始类中心点。该方法能够很好地排除类边缘点和噪声点的影响,并且能够适应数据集中各个实际类别密度分布不平衡的情况,最终获得较好的聚类效果。 相似文献
8.
模式识别是人类的一项基本智能,同时它也是一门主要利用统计学、概率论、计算几何、机器学习、信号处理以及算法的设计等工具从可感知的数据中进行推理的学科。它与统计学、心理学、语言学、计算机科学、生物学、控制论等都有关系,它与人工智能、图像处理的研究有交叉关系。模式识别的分类问题是根据识别对象特征的观察值将其分到某个类别中去。统计决策理论是处理模式分类问题的基本理论之一,它对模式分析和分类器的设计有着实际的指导意义。贝叶斯(Bayes)决策理论方法是统计模式识别中的一个基本方法,用这个方法进行分类时要求:a.各类别总体的概率分布是已知的;b.要决策分类的类别数是一定的。在连续情况下,假设对要识别的物理对象有d种特征观察量,这些特征的所有可能的取值范围构成了d维特征向量。这些假设说明了要研究的问题有c个类别,各类别状态用来表示,i=1,2...,c;对应于各个类别出现的先验概率P()及类条件概率密度函数是已知的。如果在特征空间已观察到某一向量,就是d维特征空间上的某一个点,那么应如何把分类,就是本文所要讨论的问题。 相似文献
9.
针对传统特征映射方法存在映射时间长、高维数据转换率低等问题,提出基于最小熵的高维电力数据可视化特征映射方法。对高维电力数据进行空间模拟,从数据预处理、转换、离散化分析和特征分类方面入手,完成对高维电力数据可视化特征分类。建立电力数据类的散布矩阵,根据矩阵计算高维电力数据的特征相对值和判别值,完成数据特征提取。基于上述特征分类和特征提取结果,利用熵对高维电力数据各类的可分性进行描述,选取出熵最小的数据特征,定义数据的熵并将熵当作数据类别的可分性判据,利用电力数据的总体熵实现高维数据到低维数据的映射。实验结果表明,所提方法的特征数据分类准确度较高,且平均高维数据转换率为78%左右,映射耗时短,远远优于传统方法,验证了所提方法的优越性能。 相似文献
10.
11.
Text documents usually contain high dimensional non-discriminative (irrelevant and noisy) terms which lead to steep computational costs and poor learning performance of text classification. One of the effective solutions for this problem is feature selection which aims to identify discriminative terms from text data. This paper proposes a method termed “Hebb rule based feature selection (HRFS)”. HRFS is based on supervised Hebb rule and assumes that terms and classes are neurons and select terms under the assumption that a term is discriminative if it keeps “exciting” the corresponding classes. This assumption can be explained as “a term is highly correlated with a class if it is able to keep “exciting” the class according to the original Hebb postulate. Six benchmarking datasets are used to compare HRFS with other seven feature selection methods. Experimental results indicate that HRFS is effective to achieve better performance than the compared methods. HRFS can identify discriminative terms in the view of synapse between neurons. Moreover, HRFS is also efficient because it can be described in the view of matrix operation to decrease complexity of feature selection. 相似文献
12.
Many machine learning algorithms have been applied to text classification tasks. In the machine learning paradigm, a general inductive process automatically builds a text classifier by learning, generally known as supervised learning. However, the supervised learning approaches have some problems. The most notable problem is that they require a large number of labeled training documents for accurate learning. While unlabeled documents are easily collected and plentiful, labeled documents are difficultly generated because a labeling task must be done by human developers. In this paper, we propose a new text classification method based on unsupervised or semi-supervised learning. The proposed method launches text classification tasks with only unlabeled documents and the title word of each category for learning, and then it automatically learns text classifier by using bootstrapping and feature projection techniques. The results of experiments showed that the proposed method achieved reasonably useful performance compared to a supervised method. If the proposed method is used in a text classification task, building text classification systems will become significantly faster and less expensive. 相似文献
13.
《Information processing & management》2023,60(2):103233
Zero-shot object classification aims to recognize the object of unseen classes whose supervised data are unavailable in the training stage. Recent zero-shot learning (ZSL) methods usually propose to generate new supervised data for unseen classes by designing various deep generative networks. In this paper, we propose an end-to-end deep generative ZSL approach that trains the data generation module and object classification module jointly, rather than separately as in the majority of existing generation-based ZSL methods. Due to the ZSL assumption that unseen data are unavailable in the training stage, the distribution of generated unseen data will shift to the distribution of seen data, and subsequently causes the projection domain shift problem. Therefore, we further design a novel meta-learning optimization model to improve the proposed generation-based ZSL approach, where the parameters initialization and the parameters update algorithm are meta-learned to assist model convergence. We evaluate the proposed approach on five standard ZSL datasets. The average accuracy increased by the proposed jointly training strategy is 2.7% and 23.0% for the standard ZSL task and generalized ZSL task respectively, and the meta-learning optimization further improves the accuracy by 5.0% and 2.1% on two ZSL tasks respectively. Experimental results demonstrate that the proposed approach has significant superiority in various ZSL tasks. 相似文献
14.
《Information processing & management》2020,57(6):102288
Deep hashing has been an important research topic for using deep learning to boost performance of hash learning. Most existing deep supervised hashing methods mainly focus on how to effectively preserve the similarity in hash coding solely depending on pairwise supervision. However, such pairwise similarity-preserving strategy cannot fully explore the semantic information in most cases, which results in information loss. To address this problem, this paper proposes a discriminative dual-stream deep hashing (DDDH) method, which integrates the pairwise similarity loss and the classification loss into a unified framework to take full advantage of label information. Specifically, the pairwise similarity loss aims to preserve the similarity and structural information of high-dimensional original data. Meanwhile, the designed classification loss can enlarge the margin between different classes which improves the discrimination of learned binary codes. Moreover, an effective optimization algorithm is employed to train the hash code learning framework in an end-to-end manner. The results of extensive experiments on three image datasets demonstrate that our method is superior to several state-of-the-art deep and non-deep hashing methods. Ablation studies and analysis further show the effectiveness of introducing the classification loss in the overall hash learning framework. 相似文献
15.
16.
This paper proposes an online video-based approach to handwritten Arabic alphabet recognition. Various temporal and spatial feature extraction techniques are introduced. The motion information of the hand movement is projected onto two static accumulated difference images according to the motion directionality. The temporal analysis is followed by two-dimensional discrete cosine transform and Zonal coding or Radon transformation and low pass filtering. The resulting feature vectors are time-independent thus can be classified by a simple classification technique such as K Nearest Neighbor (KNN). The solution is further enhanced by introducing the notion of superclasses where similar classes are grouped together for the purpose of multiresolutional classification. Experimental results indicate an impressive 99% recognition rate on user-dependant mode. To validate the proposed technique, we have conducted a series of experiments using Hidden Markov models (HMM), which is the classical way of classifying data with temporal dependencies. Experimental results revealed that the proposed feature extraction scheme combined with simple KNN yields superior results to those obtained by the classical HMM-based scheme. 相似文献
17.
A proposed particle swarm classifier has been integrated with the concept of intelligently controlling the search process of PSO to develop an efficient swarm intelligence based classifier, which is called intelligent particle swarm classifier (IPS-classifier). This classifier is described to find the decision hyperplanes to classify patterns of different classes in the feature space. An intelligent fuzzy controller is designed to improve the performance and efficiency of the proposed classifier by adapting three important parameters of PSO (inertia weight, cognitive parameter and social parameter). Three pattern recognition problems with different feature vector dimensions are used to demonstrate the effectiveness of the introduced classifier: Iris data classification, Wine data classification and radar targets classification from backscattered signals. The experimental results show that the performance of the IPS-classifier is comparable to or better than the k-nearest neighbor (k-NN) and multi-layer perceptron (MLP) classifiers, which are two conventional classifiers. 相似文献
18.
农作物遥感分类特征变量选择研究现状与展望 总被引:5,自引:0,他引:5
农作物遥感分类是农作物种植面积估算的重要核心问题,是提高农作物种植面积估算精度的关键研究内容。特征变量的选择是农作物遥感分类的重要步骤,有效地使用多种特征变量是提高农作物遥感分类精度的关键。随着多源数据获取的更加容易,电磁波谱特征、空间特征、时间特征以及辅助数据特征在农作物遥感分类中发挥着重要的作用。本文简要回顾和综合分析了在农作物遥感分类中所使用的各种特征变量,包括多光谱特征、微波散射特征、多源数据特征、高光谱数据特征等电磁波谱特征,以及空间特征、时间特征和辅助数据特征等,并分析了农作物遥感分类特征变量选择方面存在的问题和发展趋势。指出目前农作物遥感分类特征变量选择存在的关键问题主要包括特征变量选择的理论研究不足和综合应用存在缺陷两个方面。未来农作物遥感分类特征选择研究的核心内容主要包括生化组分特征及冠层结构特征等农作物遥感分类新特征变量的挖掘、分类特征变量的综合应用、农作物遥感分类特征变量的敏感性和不确定性研究3个方面。 相似文献
19.
基于人工神经网络(ANN)中自组织特征映射神经网络(Kohonen)的聚类功能,提取7个反映旅游需求发展情况的特征指标,对我国城市居民的旅游需求进行分类,将39个城市分为6类。对分类结果进行了分析,对方法进行了讨论,指出Kohonen网络可以避免传统聚类方法难以克服的一些缺点,是一种具有强大的自学习功能、良好的自组织性和自适应性、能迅速客观地得到聚类结果的聚类方法。 相似文献
20.
《Information processing & management》2023,60(6):103479
In recent years, Zero-shot Node Classification (ZNC), an emerging and more difficult task is starting to attract attention, where the classes of testing nodes are unobserved in the training stage. Existing studies for ZNC mainly utilize Graph Neural Networks (GNNs) to construct the feature subspace to align with the classes’ semantic subspace, thus enabling knowledge transfer from seen classes to unseen classes. However, the modeling of the node feature is single-view and unilateral, e.g., the bag-of-words vector, which is not enough to fully describe the characteristics of the node itself. To address this dilemma, we propose to develop the Multi-View Enhanced zero-shot node classification paradigm (MVE) to promote the machine’s generality to approach the human-like thinking mode. Specifically, multi-view features are obtained from different aspects such as pre-trained model embeddings, knowledge graphs, statistic methods, and then fused by a contrastive learning module into the compositional node representation. Meanwhile, a developed Graph Convolutional Network (GCN) is used to make the nodes fully absorb the information of neighbors while the over-smooth issue is alleviated by multi-view features and the proposed contrastive learning mechanism. Experimental results conducted on three public datasets show an average 25% improvement compared to baseline methods, proving the superiority of our multi-view learning framework. The code and data can be found at https://github.com/guaiqihen/MVE. 相似文献