:::

詳目顯示

回上一頁
題名:高效率探勘關聯規則之演算法--EFI
書刊名:資訊管理學報
作者:黃仁鵬 引用關係藍國誠
作者(外文):Huang, Jen-pengLan, Guo-cheng
出版日期:2007
卷期:14:2
頁次:頁139-167
主題關鍵詞:資料探勘關聯法則二階段過濾Data miningAssociation rulesTwo phase filtrations
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(2) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:2
  • 共同引用共同引用:4
  • 點閱點閱:91
資料探勘的技術變得日益重要,也廣泛的應用在商業上的預測以及決策的支援。關聯法則在資料探勘的領域中也扮演相當重要的地位,許多關聯法則演算法不斷被提出、改進,以增進效能或節省記憶體空間;本研究也朝著這個目標,試著改進關聯規則演算法為主要方向。 本研究主要是針對探勘關聯規則QDI演算法的特性及缺點來加以改進,雖然QDI演算法已是很有效率的演算法之一,不過,它還是有兩個最主要的問題。第一,QDI演算法無法探勘交易長度太長的交易資料庫。第二,QDI演算法對記憶體使用率不佳;因此,QDI演算法的實用性大打折扣。 基本於上述理由,本研究提出一個改良QDI演算法產生項目集的核心概念新演算法EFI (An Efficient Approach for Filtering Infrequent Itemsets)。EFI演算法的特色就是二階段過濾的方式,因為該過濾方式可大量減少非高頻項目集的數量,將更能適用於探勘交易長度較長的資料庫,僅需掃描資料庫四次且不需要產生任何候選項目集,即可快速找出關聯規則。另外,EFI演算法也改進ICI-like演算法因儲存大量項目集須耗用龐大記憶體空間的缺點,每筆交易經過二階段過濾機制的處理後,僅會產生最有可能成為高頻的項目集,因此,EFI能大量降低項目表須耗用的記憶體空間,以提升記憶體的使用率。在現實生活中的資料庫容量通常都是大於記憶體容量,為了解決此問題,EFI演算法將選擇採用資料庫分割方式繼續執行探勘任務,每個子資料庫僅需四次I/O動作,不隨著高頻項目集的長度增長而增加I/O次數,以避免耗費過多的I/O時間,也可有效提高執行效率與實用性。
The technology of data mining is more important in recent years, and it is generally applied to commercial forecast and decision supports. Association rules mining algorithms in the field of data mining play the important role. Many of association rules mining algorithms were proposed to improve the efficiency of data mining or save the utility rate of memory. So, our major study tries to improve the efficiency of association rules mining algorithms. In this paper, our major study is to improve the defects of the QDI algorithm. Although QDI algorithm was one of the most efficient algorithms, but it still has two serious problems; in the first place, QDI algorithm can't mine the transactions of databases whose record length is very long; in the second place, QDI algorithm isn't very efficient at utility rate of the memory. Therefore, the QDI algorithm is not very practical. Based on above reasons we propose a new algorithm-EFI (An Efficient Approach for Filtering Infrequent Itemsets) that is improved from QDI algorithm. The one of the characters of EFI algorithm is the two phrase filtrations which can reduce lots of non-frequent itemsets and is very suitable to mine the transactions of databases whose record length is very long. To find association rules quickly the EFI algorithm only scans database four times and doesn't generate any candidate itemset in mining process. Besides, the EFI algorithm also improves the defect of the ICI-like algorithms that need lots of memory spaces to store lots of sub-itemsets which are decomposed from the transaction records of database. However, the EFI algorithm uses the two phrase filtrations to filter out lots of non-frequent itemsets; it only generates the itemsets which are the most possible to be the frequent itemsets. So, the EFI algorithm can decrease a large number of non-frequent itemsets and increase the utility rate of memory. The size of the databases in the real world is always greater than the size of the memory. In order to solve this problem, the EFI algorithm divides a large database into many sub-databases and mines association rules from those sub-databases. The EFI algorithm only scans database four times and will not be affect by the length of frequent itemsets. The EFI algorithm avoids wasting a lot of I/O time and increases the efficiency and the practicability in application.
期刊論文
1.Han, J.、Pei, J.、Yin, Y.、Mao, R.(2004)。Mining frequent patterns without candidate generation: a frequent pattern tree approach。Data Mining and Knowledge Discovery,8(1),53-87。  new window
2.Park, J. S.、Chen, M. S.、Yu, P. S.(1995)。An effective hash-based algorithm for mining association rules。Association for computing machinery special interest group on management of data,24(2),175-186。  new window
會議論文
1.Brin, S.、Motwani, R.、Ullman, J. D.、Tsur, S.(1997)。Dynamic Itemset Counting and Implication Rules for Market Basket Data。The 1997 ACM SIGMOD international conference on Management of data,255-264。  new window
2.Agrawal, Rakesh、Srikant, Ramakrishnan(1994)。Fast algorithms for mining association rules。The 20th International Conference on Very Large Data Bases,487-499。  new window
3.黃仁鵬、熊浩志、郭煌政(2004)。直覺拆解之關聯法則演算法-IDA。0。1857-1866。  延伸查詢new window
4.黃仁鵬、錢依佩、吳聲弘(2003)。高效率之關聯規則探勘演算法-ICI。0。155-155。new window  延伸查詢new window
5.Tseng, F. C.、Hsu, C. C.(2001)。Generating Frequent Patterns with the Frequent Pattern List。0。376-386。  new window
學位論文
1.熊浩志(2005)。快速資料探勘演算法與相關應用(碩士論文)。南台科技大學。  延伸查詢new window
2.黃南傑(2004)。高效率拆解之關聯規則探勘(碩士論文)。南台科技大學。  延伸查詢new window
3.陳秀如(2004)。高效率之關聯法則規則探勘演算法QSD,0。  延伸查詢new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關書籍
 
無相關著作
 
QR Code
QRCODE