|本期目录/Table of Contents|

 基于层次的最大频繁项集挖掘算法 (PDF)

《电子设计工程》[ISSN:1674-6236/CN:61-1477/TN]

期数:
 2010年03期
页码:
 9-11
栏目:
 计算机技术应用
出版日期:
 2010-03-05

文章信息/Info

Title:
 Algorithm for mining maximal frequent itemsets based on layer
作者:
 梅登华 蔡少伟
 华南理工大学广东广州510006
Author(s):
  MEI Deng-huaCAI Shao-wei
 South China University of Technology,Guangzhou510006,China
关键词:
 数据挖掘关联规则最大频繁项集
Keywords:
 data miningassociation rulesmaximum frequent itemset
分类号:
 TP311
DOI:
 -
文献标识码:
 A
摘要:
 关联规则的研究是数据挖掘中的重要问题,如何高效地发现频繁项集是关联规则研究中的关键问题。根据数据库事务的统计性规律,在最大频繁项集发现算法Apriori及其变种算法的基础上,提出一种新的基于层次的最大频繁项集的发现算法。首先从整体上判断候选集的频繁性,然后在发现最大频繁项集的过程中,通过引入整体性策略、排序策略、最小策略有效地减少了候选集与数据库事务之间的比较次数。实验结果表明,采用该算法处理数据库事务数量大的最大频繁项集的发现任务,其效率相比Apriori算法有显著的提高。
Abstract:
 The research on association rule is an important problem in data mining,how to efficiently discover frequent item-sets is a key problem in association rule research.According to the statistical nature of database transaction,and based on the maximum frequent itemsets discovery algorithm Apriori and its variants,this paper proposes a new algorithm for discovering maximum frequent itemsets that based on layer.Firstly,the algorithm judged the overall frequency of the candidate itemsets,and then through the introduction of the overall strategy,sequencing strategy,the minimum strategy,effectively reduced the comparison times between database transactions and candidates in the process of discovering maximum frequent itemsets.Experimental results show that:when finding maximum frequent itemsets,the efficiency of this algorithm is much better than Apriori in dealing with the task that with large number of database transaction.

参考文献/References

 [1]Agrawal R,Srikant R.Fast algorithm for mining association ru les[C]//Proceeding of the20th International Conference on VLDB,Santigo,1994:487-498.
[2]Houtsma M,Swami A.Set-Oriend mining for association rules in relational database[C]//Proceedings of the International Conference on Data Engineering,Los Alamitos:IEEE Com-puter Press,1995:25-34.
[3]Savasere A,Omiecinski E,Navathe SM.An efficient algorithm for mining association rules[C]//Proceedings of the21st In-ternational Conference on VLDB,Zurich,1995:420-445.
[4]Bayardo R.Efficiently mining long patterns from databases[C]//Proceedings of the ACM SIGMOD International Conference on Management of Data,New York:ACM Press,1998:84-94.
[5]Lin,Dao-I,Kedem,et al.Pincer-Search:A new algorithm fordiscovering the maximum frequent set[C]//Proceedings of the6th European Conference on DT,Heidel-berg:Springe-Verlag,1998:105-120.
[6]路松峰,卢正鼎.快速开采最大频繁项目集[J].软件学报,2001,12(2):293-297.
[7]宋余庆,朱玉全,孙志辉,等.基于FP-Tree的最大频繁项目集挖掘及更新算法[J].软件学报,2003,14(9):1586-1592.

备注/Memo

备注/Memo:
 收稿日期:2009-07-30稿件编号:200907096基金项目:中国民用航空总局联合资助项目(60776816)作者简介:梅登华(1967—),男,江西南昌人,博士,副教授。研究方向:软件可靠性,智能机器人。
更新日期/Last Update:  2010-03-05