共查询到20条相似文献,搜索用时 218 毫秒
1.
从一个新的思路对基于最小Gini指标的决策树分类算法进行了讨论。简单介绍了CART算法和Gini指标的定义,并且对SLIQ和SPRINT决策树分类技术进行深入的分析。同时对SLIQ算法的时间复杂性和这两种算法的内存管理和性能方面进行了比较和分析。 相似文献
2.
3.
决策树算法是数据挖掘系统中一个重要的分类算法,选择合理而有效的测试属性以及对决策树进行适当的修剪是决策树算法的关键内容之一。将决策树算法引入教务管理挖掘系统,并对决策树测试属性的选择算法以及预剪枝算法进行改进。以九江学院学生四级考试信息为例,结果表明改进的决策树算法对于数据挖掘更具可靠性和有效性。 相似文献
4.
5.
旅游景点信用评估是一种典型的分类问题,本文概述了粗糙集和决策树的理论,基于这两种理论,提出了一个基于数据挖掘粗糙集理论与决策树分类技术相结合的信用评估方法来建立旅行景点的信用评估模型,利用粗糙集的知识约简的概念,对样本数据进行预处理,去除冗余属性对分类模型的影响,然后用决策树方法建立分类模型。最后通过Pawlak重要度的属性约简算法和ID3决策树算法实现了该模型。 相似文献
6.
7.
决策树分类方法在数据挖掘和机器学习中占据很重要的位置,为了应对数据量不断增长的情况,传统的决策树分类算法需要通过模糊计算来解决多模态多维度的数据处理,对经典的ID3算法进行了介绍并给出了其模糊化处理的办法。 相似文献
8.
剪枝过程是决策树分类学习中的重要环节,能够简化决策树并提高决策树的泛化能力,避免对训练数据集的过适应。在PEP算法的基础上,本文提出了一种改进的决策树剪枝算法IPEP,实验结果表明,该算法剪枝效果较PEP算法更好。 相似文献
9.
10.
11.
将多分类器融合技术用于CRM中的客户分类研究,以提高分类性能。以决策树作为基本分类器,引入最小二乘技术进行多分类器线性融合。实证结果显示,4种不同的融合方案的分类性能均胜过任一基本分类器,甚至优于基于遗传算法的神经网络融合分类结果,从而表明了该方法的可行性和有效性。 相似文献
12.
介绍了关联规则的基本概念,总结了关联规则的分类及各种挖掘算法,并对一些典型算法进行了介绍,最后,展望了关联规则挖掘的下一步研究方向。 相似文献
13.
14.
介绍了关联规则的基本概念,总结了关联规则的分类及各种挖掘算法,并对一些典型算法进行了介绍,最后展望了关联规则挖掘的下一步研究方向。 相似文献
15.
16.
Automatic text classification is the problem of automatically assigning predefined categories to free text documents, thus allowing for less manual labors required by traditional classification methods. When we apply binary classification to multi-class classification for text classification, we usually use the one-against-the-rest method. In this method, if a document belongs to a particular category, the document is regarded as a positive example of that category; otherwise, the document is regarded as a negative example. Finally, each category has a positive data set and a negative data set. But, this one-against-the-rest method has a problem. That is, the documents of a negative data set are not labeled manually, while those of a positive set are labeled by human. Therefore, the negative data set probably includes a lot of noisy data. In this paper, we propose that the sliding window technique and the revised EM (Expectation Maximization) algorithm are applied to binary text classification for solving this problem. As a result, we can improve binary text classification through extracting potentially noisy documents from the negative data set using the sliding window technique and removing actually noisy documents using the revised EM algorithm. The results of our experiments showed that our method achieved better performance than the original one-against-the-rest method in all the data sets and all the classifiers used in the experiments. 相似文献
17.
Language modeling is an effective and theoretically attractive probabilistic framework for text information retrieval. The basic idea of this approach is to estimate a language model of a given document (or document set), and then do retrieval or classification based on this model. A common language modeling approach assumes the data D is generated from a mixture of several language models. The core problem is to find the maximum likelihood estimation of one language model mixture, given the fixed mixture weights and the other language model mixture. The EM algorithm is usually used to find the solution. 相似文献
18.
基于改进遗传算法的高光谱图像波段选择 总被引:3,自引:0,他引:3
在对地观测领域,高光谱图像得到了广泛应用,但存在数据量大、波段间相关性高等问题. 针对以上问题分析了已有的波段选择方法,提出了基于信息量及类间可分离性准则的遗传算法对高光谱图像进行波段选择:构造波段互相关系数矩阵进行子空间划分;利用联合熵作为组合信息量的标准,Bhattacharyya距离作为类间可分离性标准,构造遗传算法的适应度方程,改进了遗传算法中的选择算子. 最后用AVIRIS图像对提出的算法进行试验,并利用最大似然分类法对最优波段组合进行分类,总体分类精度达到94.24%,Kappa系数达到0.94. 相似文献
19.
《Information processing & management》2005,41(2):313-330
The paper proposes a new approach to create a patent classification system to replace the IPC or UPC system for conducting patent analysis and management. The new approach is based on co-citation analysis of bibliometrics. The traditional approach for management of patents, which is based on either the IPC or UPC, is too general to meet the needs of specific industries. In addition, some patents are placed in incorrect categories, making it difficult for enterprises to carry out R&D planning, technology positioning, patent strategy-making and technology forecasting. Therefore, it is essential to develop a patent classification system that is adaptive to the characteristics of a specific industry. The analysis of this approach is divided into three phases. Phase I selects appropriate databases to conduct patent searches according to the subject and objective of this study and then select basic patents. Phase II uses the co-cited frequency of the basic patent pairs to assess their similarity. Phase III uses factor analysis to establish a classification system and assess the efficiency of the proposed approach. The main contribution of this approach is to develop a patent classification system based on patent similarities to assist patent manager in understanding the basic patents for a specific industry, the relationships among categories of technologies and the evolution of a technology category. 相似文献
20.
垃圾分类教育作为全民教育的重要内容,也是职业教育的重要组成部分,垃圾分类的基础性建设在于普及垃圾分类教育,将“互联网+”的理念技术运用于垃圾分类教育中,不仅可以提升教育效果,还可以提升教学效率。本文从“互联网+”背景下垃圾分类教育的现状出发,在探究垃圾分类教育当前存在问题的基础上提出一些运用“互联网+”进行垃圾分类的教育手段,以提升公民的垃圾分类意识,达到广泛宣传垃圾分类的目的。 相似文献