首页 | 本学科首页   官方微博 | 高级检索  
     检索      

模糊等值理论在数据清理中的应用
引用本文:李华旸,刘玉葆,李又奎.模糊等值理论在数据清理中的应用[J].东南大学学报,2004,20(4):454-457.
作者姓名:李华旸  刘玉葆  李又奎
作者单位:[1]华中科技大学计算机科学与技术学院,武汉430074//江西财经大学用友软件学院,南昌330013 [2]华中科技大学计算机科学与技术学院,武汉430074 [3]华为技术有限公司南京研究所,南京210001
摘    要:提出了规则合并的优化方法和重复记录聚类清除的方法.应用模糊等值理论,避免了传统等值理论非此即彼的僵硬方式,但清理过程中部分规则可能存在包含与被包含的关系,被包含的规则其等值度显然会相对较小,根据用户阀值提出了规则合并的优化方法,可减少重复记录的计算时间.基于同样的原因,规则间的包含与被包含关系将影响模糊等值度的误差分析,因此提出了利用忽略被包含的规则等值度提高误差分析精度的改进模糊等值理论误差分析方法.重复记录的核实通常需要人工逐条检测,易于出错,本文提出的聚类算法,可节省大量的用户劳动.最后给出一个实验,表明了规则优化的可能性.

关 键 词:等值理论  等值度  数据清理

Application of fuzzy equivalence theory in data cleaning
Li Huayang Liu Yubao Li Youkui.Application of fuzzy equivalence theory in data cleaning[J].Journal of Southeast University(English Edition),2004,20(4):454-457.
Authors:Li Huayang Liu Yubao Li Youkui
Abstract:This paper presents a rule merging and simplifying method and an improved analysis deviation algorithm. The fuzzy equivalence theory avoids the rigid way (either this or that) of traditional equivalence theory. During a data cleaning process task, some rules exist such as "included"/"being included" relations with each other. The equivalence degree of the being-included rule is smaller than that of the including rule, so a rule merging and simplifying method is introduced to reduce the total computing time. And this kind of relation will affect the deviation of fuzzy equivalence degree. An improved analysis deviation algorithm that omits the influence of the included rules' equivalence degree is also presented. Normally the duplicate records are logged in a file, and users have to check and verify them one by one. It's time-cost. The proposed algorithm can save users' labor during duplicate records checking. Finally, an experiment is presented which demonstrates the possibility of the rule.
Keywords:equivalence theory  equivalence degree  data cleaning
本文献已被 CNKI 维普 万方数据 等数据库收录!
设为首页 | 免责声明 | 关于勤云 | 加入收藏

Copyright©北京勤云科技发展有限公司  京ICP备09084417号