模糊等值理论在数据清理中的应用 Application of fuzzy equivalence theory in data cleaning期刊界 All Journals 搜尽天下杂志传播学术成果专业期刊搜索期刊信息化学术搜索

按检索

模糊等值理论在数据清理中的应用

引用本文：	李华旸,刘玉葆,李又奎.模糊等值理论在数据清理中的应用[J].东南大学学报,2004,20(4):454-457.

作者姓名：	李华旸刘玉葆李又奎

作者单位：	[1]华中科技大学计算机科学与技术学院，武汉430074／／江西财经大学用友软件学院，南昌330013 [2]华中科技大学计算机科学与技术学院，武汉430074 [3]华为技术有限公司南京研究所，南京210001

摘要：	提出了规则合并的优化方法和重复记录聚类清除的方法.应用模糊等值理论,避免了传统等值理论非此即彼的僵硬方式,但清理过程中部分规则可能存在包含与被包含的关系,被包含的规则其等值度显然会相对较小,根据用户阀值提出了规则合并的优化方法,可减少重复记录的计算时间.基于同样的原因,规则间的包含与被包含关系将影响模糊等值度的误差分析,因此提出了利用忽略被包含的规则等值度提高误差分析精度的改进模糊等值理论误差分析方法.重复记录的核实通常需要人工逐条检测,易于出错,本文提出的聚类算法,可节省大量的用户劳动.最后给出一个实验,表明了规则优化的可能性.
关键词：	等值理论等值度数据清理
Application of fuzzy equivalence theory in data cleaning

Li Huayang Liu Yubao Li Youkui.Application of fuzzy equivalence theory in data cleaning[J].Journal of Southeast University(English Edition),2004,20(4):454-457.

Authors:	Li Huayang Liu Yubao Li Youkui

Abstract:	This paper presents a rule merging and simplifying method and an improved analysis deviation algorithm. The fuzzy equivalence theory avoids the rigid way (either this or that) of traditional equivalence theory. During a data cleaning process task, some rules exist such as "included"/"being included" relations with each other. The equivalence degree of the being-included rule is smaller than that of the including rule, so a rule merging and simplifying method is introduced to reduce the total computing time. And this kind of relation will affect the deviation of fuzzy equivalence degree. An improved analysis deviation algorithm that omits the influence of the included rules' equivalence degree is also presented. Normally the duplicate records are logged in a file, and users have to check and verify them one by one. It's time-cost. The proposed algorithm can save users' labor during duplicate records checking. Finally, an experiment is presented which demonstrates the possibility of the rule.

Keywords:	equivalence theory equivalence degree data cleaning
本文献已被 CNKI 维普万方数据等数据库收录！

设为首页 | 免责声明 | 关于勤云 | 加入收藏