:::

詳目顯示

回上一頁
題名:有效率的跨交易關聯規則探勘演算法
作者:王春笙
作者(外文):Chen-Sheng Wang
校院名稱:臺灣大學
系所名稱:資訊管理學研究所
指導教授:李瑞庭
學位類別:博士
出版日期:2007
主題關鍵詞:關聯規則封閉項目集資料探勘跨交易關聯規則Association rulesClosed itemsetData miningInter-transaction association rules
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:58
在本論文中,我們針對探勘跨交易關聯規則,提出了二個有效率的演算法,稱為ITP-Miner (Inter-Transaction Patterns Miner)以及CITP-Miner (Closed Inter-Transaction Patterns Miner)。我們對這二個演算法設計了三種資料結構: ID-pair儲存跨交易樣式探勘所需的資訊; ITP-tree能夠列舉並找出所有的頻繁跨交易樣式;以及CITP-tree能夠列舉並找出所有的封閉式頻繁跨交易樣式。
ITP-Miner演算法使用ID-pair以及ITP-tree以探勘所有頻繁跨交易樣式。由於使用了ITP-tree,ITP-Miner只需讀取資料庫一遍並且將合併,修剪,及支持度的計算限定於少量的ID-pair。實驗結果顯示ITP-Miner演算法比FITI演算法快了數十倍。
CITP-Miner演算法使用ID-pair以及CITP-tree以探勘所有封閉式頻繁跨交易樣式。由於使用了CITP-tree,CITP-Miner能夠使用有效的修剪策略以避免耗時的候選樣式產生以及重覆的支持度計算。實驗結果顯示CITP-Miner演算法比FITI以及ClosedPROWL演算法快了數十倍。
另外,我們比較ITP-Miner以及CITP-Miner演算法,由於CITP-Miner使用有效的修剪策略找出封閉式頻繁跨交易樣式,而封閉式頻繁跨交易樣式的數量通常少於ITP-Miner所找出所有的頻繁跨交易樣式,所以實驗結果顯示在大部份的情況下,CITP-Miner演算法執行時間較ITP-Miner演算法快,然而卻會消秏較多的記憶體。
In this dissertation, we propose two efficient algorithms, called ITP-Miner (Inter-Transaction Patterns Miner) and CITP-Miner (Closed Inter-Transaction Patterns Miner), for mining inter-transaction association rules. We devise three data structures for both algorithms: an ID-pair stores the information used to find inter-transaction patterns; an ITP-tree enumerates and discovers frequent inter-transaction patterns; and a CITP-tree enumerates and discovers closed frequent inter-transaction patterns.
The ITP-Miner algorithm uses the ID-pairs and the ITP-tree to mine all frequent inter-transaction patterns. By using the ITP-tree, the ITP-Miner requires only one database scan and can localize joining, pruning, and support counting to a small number of ID-pairs. The experiment results show that the ITP-Miner algorithm outperforms the FITI (First Intra Then Inter) algorithm by one order of magnitude.
The CITP-Miner algorithm uses the ID-pairs and the CITP-tree to mine all closed frequent inter-transaction patterns. By using the CITP-tree, the CITP-Miner can embed effective pruning strategies to avoid costly candidate generation and repeated support counting. The experiment results show that the CITP-Miner algorithm outperforms the FITI and ClosedPROWL algorithms by one order of magnitude.
We also compare the ITP-Miner and CITP-Miner algorithms. Since the CITP-Miner uses effective pruning strategies for mining all closed frequent inter-transaction patterns, and the number of patterns mined by the CITP-Miner may be much less than that mined by the ITP-Miner, the experiment results show that in most of the cases, the CITP-Miner algorithm outperforms the ITP-Miner algorithm in terms of execution time, but it consumes more amount of main memory.
[1]R.C. Agarwal, C.C. Aggarwal, V.V.V. Prasad, A tree projection algorithm for generation of frequent itemsets, Journal of Parallel and Distributed Computing 61 (3) (2001) 350-371.
[2]R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, An interval classifier for database mining applications, in: Proceedings of the International Conference on Very Large Data Bases, 1992, pp. 560-573.
[3]R. Agrawal, T. Imielinski, A. Swami, Mining association rules between sets of items in large databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1993, pp. 207–216.
[4]R. Agrawal, R. Srikant, Fast algorithms for mining association rules, in: Proceedings of International Conference on Very Large Data Bases, 1994, pp. 487–499.
[5]C. Apte, B. Liu, E. P.D. Pednault, and P. Smyth, Business applications of data mining, Communications of the ACM 45(8) (2002) 49–53.
[6]R.J. Bayardo, Efficiently mining long patterns from databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1998, pp. 85–93.
[7]B. Berendt, M. Spiliopoulou, Analysis of navigation behaviour in web sites integrating multiple information systems, The VLDB Journal 9 (2000) 56-75.
[8]S. Brin, R. Motwani, J. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1997, pp. 255-264.
[9]G. Chen, Q. Wei, Fuzzy association rules and the extended mining algorithms, Information Sciences 147 (1–4) (2002) 201–228.
[10]M.S. Chen, J. Han, P. Yu, Data mining: an overview from a database perspective, IEEE Transactions on Knowledge and Data Engineering 8 (6) (1996) 866-883.
[11]M.S. Chen, J.S Park, and P.S. Yu, Efficient data mining for path traversal patterns, IEEE Transactions on Knowledge and Data Engineering 10(2) (1998) 209-221.
[12]Y.L. Chen, S.S. Chen, P.Y. Hsu, Mining hybrid sequential patterns and sequential rules, Information Systems 27 (5) (2002) 345-362.
[13]Y.L. Chen, J.M. Chen, C.W. Tung, A data mining approach for retail knowledge discovery with consideration of the effect of shelf-space adjacency on sales, Decision Support Systems 42 (3) (2006) 1503-1520.
[14]Y.L. Chen, W.H. Hsu, Y.H. Lee, TASC: Two-attribute-set clustering through decision tree construction, European Journal of Operational Research 174 (2) (2006) 930-944.
[15]Y.L. Chen, H.L. Hu, An overlapping cluster algorithm to provide non-exhaustive clustering, European Journal of Operational Research 173 (3) (2006) 762-780.
[16]Y.L. Chen, Y.H. Hu, Constraint-based sequential pattern mining: The consideration of recency and compactness, Decision Support Systems 42 (2) (2006) 1203-1215.
[17]Y.L. Chen, T.C.K. Huang, A new approach for discovering fuzzy quantitative sequential patterns in sequence databases, Fuzzy Sets and Systems 157 (12) (2006) 1641-1661.
[18]Y.L. Chen, C.C. Shen, Mining generalized knowledge from ordered data through attribute-oriented induction techniques, European Journal of Operational Research 166 (1) (2005) 221-245.
[19]Y.L. Chen, K. Tang, R.J. Shen, Y.H. Hu, Market basket analysis in a multiple store environment, Decision Support Systems 40 (2) (2005) 339-354.
[20]Y.L. Chen, C.C. Yu, Mining sequential patterns from multidimensional sequence data, IEEE Transactions on Knowledge and Data Engineering 17(1) (2005) 136-140.
[21]D.K.Y. Chiu, A.K.C. Wong, Multiple pattern associations for interpreting structural and functional characteristics of biomolecules, Information Sciences 167 (1–4) (2004) 23–39.
[22]U. Fayyad, R. Uthurusamy, Data mining and knowledge discovery in databases, Communications of the ACM 39(11) (1996) 24-25.
[23]L. Feng, T.S. Dillon, J. Liu, Inter-transactional association rules for multi-dimensional contests for prediction and their application to studying meteorological data, Data and Knowledge Engineering 37 (1) (2001) 85–115.
[24]L. Feng, J.X. Yu, H. Lu, J. Han, A template model for multidimensional inter-transactional association rules, The VLDB Journal 11 (2) (2002) 153–175.
[25]V. Ganti, J. Gehrke, R. Ranakrishnan, Mining very large databases, IEEE Computer 32 (8) (1999) 38-45.
[26]F. Geerts, B. Goethals, J. Bussche, A tight upper bound on the number of candidate patterns, in: Proceedings of International Conference on Data Mining, 2001, pp. 155–162.
[27]P. Giudici, R. Castelo, Association models for web mining, Data Mining and Knowledge Discovery 5 (2001) 183-196.
[28]G. Grahne, J. Zhu, Efficiently using prefix-trees in mining frequent itemsets, in: Proceedings of the IEEE ICDM Workshop in Frequent Itemset Mining Implementations, 2003, pp. 123–132.
[29]T. Hamrouni, S. BenYahia, Y. Slimani, Avoiding the itemset closure computation “pitfall”, in: Proceedings of the 3rd International Conference on Concept Lattices and their Applications, 2005, pp. 46-59.
[30]J. Han, Y. Fu, Discovery of multiple-level association rules from large databases, in: Proceedings of International Conference on Very Large Data Bases, 1995, pp. 420-431.
[31]J. Han, M. Kamber, Data mining: Concepts and Techniques, Morgan Kaufmann, San Fransisco, 2001.
[32]J. Han, M. Kamber, Data mining: Concepts and Techniques Second Edition, Morgan Kaufmann, San Fransisco, 2006.
[33]J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, in: Proceedings of the ACM-SIGMOD International Conference on Management of Data, 2000, pp. 1-12.
[34]J. Han, J. Wang, Y. Lu, P. Tzvetkov, Mining top-k frequent closed patterns without minimum support, in: Proceedings of the 2nd International Conference on Data Mining, 2002, pp. 211-218.
[35]P.Y. Hsu, Y.L. Chen, C.C. Ling, Algorithms for mining association rules in bag databases, Information Sciences 166 (1–4) (2004) 31–47.
[36]W.H. Hsu, J.A. Jao, Y.L. Chen, Discovering conjecturable rules through tree-based clustering analysis, Expert Systems with Applications 29 (3) (2005) 493-505.
[37]Y.H. Hu, Y.L. Chen, Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism, Decision Support Systems 42 (1) (2006) 1-24.
[38]K.Y. Huang, C.H. Chang, K.Z. Lin, ClosedPROWL: Efficient mining of closed frequent continuities by projected window list technology, in: Proceedings of the 5th SIAM International Conference on Data Mining, 2005.
[39]R. Kohavi, C. Broadley, B. Frasca, L. Mason, Z. Zheng, KDD-cup 2000 organizers’ report: peeling the onion, SIGKDD Explorations 2 (2) (2000) 86–98.
[40]A.J.T. Lee, R.W. Hong, W.M. Ko, W.K. Tsao, H.H. Lin, Mining spatial association rules in image databases, Information Sciences 177 (7) (2007) 1593–1608.
[41]A.J.T. Lee, W.C. Lin, C.S. Wang, Mining association rules with multi-dimensional constraints, The Journal of Systems and Software 79 (1) (2006) 79–92.
[42]A.J.T. Lee, C.S. Wang, An efficient algorithm for mining frequent inter-transaction patterns, Information Sciences 177 (17) (2007) 3453–3476.
[43]A.J.T. Lee, Y.T. Wang, Efficient data mining for calling path patterns in GSM networks, Information Systems 28 (8) (2003) 929–948.
[44]S.J. Lee, K. Siau, A review of data mining techniques, Industrial Management & Data Systems 101(1) (2001) 41-46.
[45]Q. Li, L. Feng, A. Wong, From intra-transaction to generalized inter-transaction: landscaping multidimensional contexts in association rule mining, Information Sciences 172 (3–4) (2005) 361–395.
[46]T.S. Lim, W.Y. Loh, Y.S. Shih, A comparison of prediction accuracy complexity and training time of thirty-three old and new classification algorithms, Machine Learning 40(3) (2000) 203-228.
[47]D. Littau, D. Boley, Streaming data reduction using low-memory factored representations, Information Sciences 176 (14) (2006) 2016–2041.
[48]D.R. Liu, C. Hsu, Project-based knowledge maps: combining project mining and xml-enabled topic maps, Internet Research 14 (3) (2004) 254-266.
[49]D.R. Liu, C.K. Ke, Knowledge support for problem-solving in a production process: A hybrid of knowledge discovery and case-based reasoning, Expert Systems with Applications 33(1) (2007) 147-161.
[50]D.R. Liu, C.K. Ke, J.Y. Lee, C.F. Lee, Knowledge maps for composite e-services: A mining-based system platform coupling with recommendations, Expert Systems with Applications 34 (1) (2008) 700-716.
[51]D.R. Liu, Y.Y. Shih, Integrating AHP and data mining for product recommendation based on customer lifetime value, Information & Management 42 (3) (2005) 387-400.
[52]D.R. Liu, Y.Y. Shih, Hybrid approaches to product recommendation based on customer lifetime value and purchase preferences, Journal of Systems and Software 77 (2) (2005) 181-191.
[53]D.R. Liu, I.C. Wu, K.S. Yang, Task-based k-support system: disseminating and sharing task-relevant knowledge, Expert Systems with Applications 29 (2) (2005) 408-423.
[54]H. Lu, J. Han, L. Feng, Stock movement prediction and n-dimensional inter-transaction association rules, in: Proceedings of ACM SIGMOD Workshop on Research Issues on Data Mining and Knowledge, 1998, pp. 1–7.
[55]H. Lu, L. Feng, J. Han, Beyond intratransaction association analysis: mining multidimensional inter-transaction association rules, ACM Transactions on Information Systems 18 (4) (2000) 423–454.
[56]B. Lucchese, S. Orlando, R. Perego, Fast and memory efficient mining of frequent closed itemsets, IEEE Transactions on Knowledge and Data Engineering 18 (1) (2006) 21-36.
[57]H.D.K. Moonestinghe, S. Fodeh, P. N. Tan, Frequent closed itemset mining using prefix graphs with an efficient flow-based pruning strategy, in: Proceedings of the 6th International Conference on Data Mining, 2006, pp. 426-435.
[58]J.S. Park, M.S. Chen, P.S. Yu, An effective hash-based algorithm for mining association rules, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 1995, pp. 175-186.
[59]N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Efficient mining of association rules using closed itemset lattices, Information Systems 24 (1) 1999 25-46.
[60]N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proceedings of the 5th International Conference on Database Theory, 1999, pp. 398-416.
[61]J. Pei, J. Han, R. Mao, Closet: An efficient algorithm for mining frequent closed itemsets, in: Proceedings of the 5th ACM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp. 11-20.
[62]J. Pei, J. Han, L.V.S. Lakshmanan, Mining frequent itemsets with convertible constraints, in: Proceedings of IEEE International Conference on Data Engineering, 2001, pp. 433–332.
[63]A. Savasere, E. Omiecinski, S. Navathe, An efficient algorithm for mining association rules in large databases, in: Proceedings of International Conference on Very Large Data Bases, 1995, pp. 432–443.
[64]Li. Shen, Hong Shen, Ling Cheng, New algorithms for efficient mining of association rules, Information Sciences 118 (1–4) (1999) 251–268.
[65]M. Shen, D.R. Liu, Discovering role-relevant process-views for disseminating process knowledge, Expert Systems with Applications 26 (3) (2004) 301-310.
[66]P. Shenoy, J.R. Haritsa, S. Sudarshan, G. Bhalotia, M. Bawa, D. Shah, Turbo-charging vertical mining of large databases, in: Proceedings of ACM-SIGMOD International Conference on Management of Data, 2000, pp. 22–33.
[67]N.G. Singh, S.R. Singh, A.K. Mahanta, CloseMiner: Discovering frequent closed itemsets using frequent closed tidsets, in: Proceedings of the 5th International Conference on Data Mining, 2005, pp. 633-636.
[68]R. Srikant, R. Agrawal, Mining generalized association rules, in: Proceedings of International Conference on Very Large Data Bases, 1995, pp. 407-419.
[69]G. Sudipto, R. Rajeev, and S. Kyuseok, ROCK: A robust clustering algorithm for categorical attributes, Information Systems 25 (5) (2000) 345-366.
[70]H. Toivonen, Sampling large databases for association rules, in: Proceedings of the International Conference on Very Large Data Bases, 1996, pp. 134-145.
[71]Y.J. Tsay, J.Y. Chiang, An efficient cluster and decomposition algorithm for mining association rules, Information Sciences 160 (1–4) (2004) 161–171.
[72]A.K.H. Tung, H. Lu, J. Han, L. Feng, Breaking the barrier of transactions: mining inter-transaction association rules, in: Proceedings of ACM International Conference on Knowledge and Data Discovery, 1999, pp. 423–454.
[73]A.K.H. Tung, H. Lu, J. Han, L. Feng, Efficient mining of intertransaction association rules, IEEE Transactions on Knowledge and Data Engineering 15 (1) (2003) 43–56.
[74]T. Uno, T. Asai, Y. Uchida, H. Arimura, An efficient algorithm for enumerating closed patterns in transaction databases, in: Proceedings of the 7th International Conference on Discovery Science, 2004, pp. 16-31.
[75]C.Y. Wang, S.S. Tseng, T.P. Hong, Flexible online association rule mining based on multidimensional pattern relations, Information Sciences 176 (12) (2006) 1752–1780.
[76]J. Wang, J. Han, J. Pei, Closet+: Searching for the best strategies for mining frequent closed itemsets, in: Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 236-245.
[77]S.B. Yahia, T. Hamrouni, E.M. Nguifo, Frequent closed itemset based algorithms: a thorough structural and analytical survey, ACM SIGKDD Explorations Newsletter 8 (1) (2006) 93-104.
[78]X. Yin, J. Han, CPAR: Classification based on predictive association rules, in: Proceedings of 2003 SIAM International Conference on Data Mining, 2003, pp. 331-335.
[79]M.J. Zaki, Scalable algorithms for association mining, IEEE Transactions on Knowledge and Data Engineering, 12 (3) (2000) 372-390.
[80]M.J. Zaki, Generating non-redundant association rules, in: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 34-43.
[81]M.J. Zaki, K. Gouda, Fast vertical mining using diffsets, in: Proceedings of the 9th ACM-SIGKDD International Conference on Knowledge Discovery and Data Mining, 2003, pp. 326-335.
[82]M.J. Zaki, C.J. Hsiao, Efficient algorithms for mining closed itemsets and their lattice structure, IEEE Transactions on Knowledge and Data Engineering 17 (4) (2005) 462-478.
[83]Yahoo, Finance, Available from: http://yahoo.com.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE