:::

詳目顯示

回上一頁
題名:有效率的探勘客戶消費行為方法的研究
作者:王秋光
作者(外文):Chiu-Kuang Wang
校院名稱:淡江大學
系所名稱:管理科學學系博士班
指導教授:顏秀珍
歐陽良裕
學位類別:博士
出版日期:2012
主題關鍵詞:資料探勘頻繁項目集頻繁封閉項目集資料流消費行為模式Data miningFrequent itemsetFrequent closed itemsetData streamConsumption behaviors
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:28
資料豐富的資料庫在數位化之後已經普遍產生,如何從資料庫中挖掘重要的資訊是資料探勘的主要任務。在商業活動的應用中,我們可以從交易資料庫中分析常常一併購買的商品以及顧客在購買某些商品之後,可能也會購買其它商品的關聯行為,也就是探勘頻繁項目集,在論文中,我們提出很有效率的探勘頻繁項目集演算法,不論在執行時間或記憶體的使用量上都優於之前的研究。
然而,新的交易資料會不斷產生,而舊有的交易資料必須被移除,若重新探勘原始資料,則會浪費時間重新找尋已知的資訊。隨時間不斷產生新資料與移除舊資料的環境稱之為資料流。因此在資料流的環境下,找出所有頻繁項目集開始被學者提出研究。另外,交易資料中的頻繁項目集可能非常多,眾多資訊會造成困擾,以致無法做決策。因此學者提出封閉項目集。若項目集的支持度比其所有超集合的支持度大,則此項目集稱為封閉項目集。由頻繁封閉項目集可衍生全部的頻繁項目集。我們也提出有效率的演算法,當資料不斷新增或被移除時,原始資料庫不需要被重新讀取,只需將新增或被移除的資料與舊有的封閉項目集做運算,就可產生更新後的封閉項目集。
耗材性商品通常在所有商品中十分經常被購買,雖然單獨的獲利可能並沒有家電、電子商品這麼高,但是累積後的獲利卻不是小數值,所以若是能針對耗材性商品掌握正確的商機促銷,對於獲取重大利潤將有很大的幫助,而頻繁項目集無法提供促銷時機的資訊。因此,我們提出一個新穎的資料探勘方式,針對於某種耗材性商品,找出不同特徵的客戶對此商品的消費行為,根據客戶的背景屬性值以及此次購買某商品的數量,我們可以利用此消費行為,正確的預測出此顧客何時會再需要此商品,以掌握行銷此商品給此客戶的時機。
Mining frequent itemsets is to discover the groups of items appearing always together excess of a user specified threshold. Many approaches have been proposed for mining frequent itemsets by applying the FP-tree structure to improve the efficiency of the FP-Growth algorithm which needs to recursively construct sub-trees. Although these approaches do not need to recursively construct many sub-trees, they also suffer the problem of a large search space, such that the performances for the previous approaches degrade when the database is massive or the threshold for mining frequent itemsets is low. In order to reduce the search space and speed up the mining process, we propose an efficient algorithm for mining frequent itemsets based on frequent pattern tree. Our algorithm generates a sub-tree for each frequent item and then generates candidates in batch from this sub-tree. For each candidate generation, our algorithm only generates a small set of candidates, which can significantly reduce the search space.
However, there may be many frequent itemsets existing in a transaction database, such that it is difficult to make a decision for a decision maker. Recently, mining frequent closed itemsets becomes a major research issue, since a set of the frequent closed itemsets is a condensed and complete representation of the frequent itemsets and all the frequent itemsets can be derived from the frequent closed itemsets. Because the transactions in a transaction database will grow rapidly in a short time, and some of the transactions may be antiquated. Consequently, the frequent closed itemsets may be changed due to the addition of the new transactions or the deletion of the old transactions from the transaction database. It is a challenge that how to update the previous closed itemsets when the transactions are added into or removed from the transaction database. We propose an efficient algorithm for incrementally mining closed itemsets without scanning the original database. Our algorithm updates closed itemsets by performing some operations on the previous closed itemsets and the added/deleted transactions without searching the previous closed itemsets.
Compared with other commodities, consumable products are purchased high-frequently. Although single gains for consumable products may be lower than that of appliances or electronic products, the accumulative gains for consumable products are great. Therefore, grasping suitable timing to do sales promotion for consumable products is an important task. Sequential pattern mining only considers the sequential purchasing behaviors for most of the customers, but they cannot predict when the customer will need the products in the future. For the consumable products, the purchase time for the next transaction is usually related to the purchase quantities for this transaction. We propose a novel data mining algorithm to find the consumption behaviors for most of customers. From this information, we can predict the next purchased time for an item based on the purchased quantity of this item at this time.
[1]Agrawal, R., Gehrke, J., Gunopulos, D. and Raghavan, P. (1998), Automatic subspace clustering of high dimensional data for data minig applications, International conference on management of data, 94-105.
[2]Agrawal, R. and Srikant, R. (1994), Fast algorithm for mining association rules, International conference on very large data bases, 487-499.
[3]Agrawal, R. and Srikant, R. (1995), Mining sequential patterns, International conference on data engineering, 3-14.
[4]Agrawal, R. and Srikant, R. (1996a), Mining sequential patterns: Gerneralizations and performance improvements, International conference on extending database technulogy, 3-17.
[5]Agrawal, R. and Srikant, R. (1996b), Mining quantitative association rules in large relational tables, International conference on management of data, 1-12.
[6]Bastide, Y., Lakhal, L., Pasquier, N., Stumme, G. and Taouil, R. (2000), Mining frequent patterns with counting inference, International conference on knowledge discovery and data mining, 2(2), 66-75.
[7]Brin, S., Motwani, R., Ullman, J. and Tsur, S. (1997), Dynamic itemset counting and implication rules for market basket data, International conference on management of data, 255-264.
[8]Cheng, J., Ke Y. and Ng, W. (2008), A survey on algorithms for mining frequent itemsets over data streams, Knowledge and Information Systems, 16(1), 1-27.
[9]Chi, Y., Wang, H., Yu, P. S. and Muntz, R. R. (2004), Moment: maintaining closed itemsets over a stream sliding window, International conference on data mining, 59-66.
[10]Donga, Jie and Han, Min (2007), BitTableFI: An efficient mining frequent itemsets algorithm, Knowledge-Based Systems, 20(4), 329-335.
[11]El-Hajj, M. and Zaiane, O. R. (2003), Non recursive generation of frequent k-itemsets from frequent pattern tree representation, International conference on data warehousing and knowledge discovery, 371-380.
[12]Ester, M., Kriegel, H. P., Sander, J. and Xu, X. (1996), A density-based algorithm for discovering clusters in large spatial database with noise, International conference on knowledge discovery in data, 226-231.
[13]Fiesler, E. (1993), Neural network classification and formalization, Computer standards and interfaces, 16(3), 231-239.
[14]Giannella, C., Han, J. and Pei, J. (2003), Mining frequent patterns in data streams at multiple time granularities, Next generation data mining, AAAI/MIT, 91-212.
[15]Han, J., Cheng, H., Xin, D. and Yan, X. (2007), Frequent pattern mining: current status and future directions, Data mining and knowledge discovery, 15(1), 55-86.
[16]Han, J., Pei, J., and Yin, Y. (2000), Mining frequent patterns without candidate generation, ACM International conference on management of data, 29(2), 1-12.
[17]Han, J., Pei, J., Yin, Y. and Mao, R. (2004), Mining frequent patterns without candidate generation: A frequent-pattern tree approach, Data mining and knowledge discovery, 8, 53-87.
[18]Hou, W., Yang, B., Zhou, Z. and Wu, C. S. (2008), An adaptive frequent itemset mining algorithm for data stream with concept drifts, International conference on computer science and software engineering, 382-385.
[19]Houtsma, M. and Swami, A. (1995), Set-oriented mining for association rules in relational databases, International conference on data engineering, 25-33.
[20]Jiang, N. and Gruenwald, L. (2006), CFI-Stream: Mining closed closed itemsets in data streams, International conference on knowledge discovery and data mining, 592-597.
[21]Jin, R., and Agrawal, G. (2005), An algorithm for in-core frequent itemset mining on streaming data, International conference on data mining, 210-217.
[22]Koh, J. L. and Shieh, S. F. (2004), An efficient approach for maintaining association rules based on adjusting fp-tree structures, International conference on database systems for advanced applications, 417-424.
[23]Li, Hua-Fu and Lee, Suh-Yin (2009), Mining frequent itemsets over data streams using efficient window sliding techniques, Expert systems with applications, 36(2), Part 1, 1466-1477.
[24]Lee, C. H., Lin, C. R. and Chen, M. S. (2001), On mining general temporal association rules in a publication database, International conference on data mining.
[25]Liu, B., Hsu, W. and Ma, Y. (1999), Mining association rules with multiple minimum supports, International conference on knowledge discovery in data , 337-341.
[26]Lucchese, C., Orlando, S. and Perego, R. (2006), Fast and memory efficient mining of frequent closed itemsets, Transaction on knowledge and data engineering, 18(1), 21-36.
[27]Manku, G. S. and Motwani, R. (2002), Approximate frequency counts over data streams, International conference on very large data bases, 346-357.
[28]Mobasher, B., Dai, H., Luo, T. and Nakagawa, M. (2001), Efficient personalization based on association rule discovery from web usage data, Workshop on web information and data management.
[29]Park, J. S., Chen, M. S. and Yu, P. S. (1995), An effective hash-based algorithm for mining association rules, Association for computing machinery special interest group on management of data, 24(2), 175-186.
[30]Pasquier, N., Bastide, Y., Taouil, R. and Lakhal, L. (1999), Discovering frequent closed itemsets for association rules, International conference on database theory, 398-416.
[31]Pei, J., Han, J., Mortazavi-Asl, B., Pinto, H., Chen, Q., Dayal, U. and Hsu, M. C. (2001), PrefixSpan: Mining sequential patterns efficiently by prefix-projected pattern growth, International conference on data engineering, 215-224.
[32]Quinlan, J. R. (1986), Introduction of decision trees, Machine learning, Vol. 1, 81-106.
[33]Quinlan, J.R. (1996), Improved Use of Continuous Attributes in C4.5, Journal of Artificial Intelligence Research, 77-90.
[34]Raïssi, C., Poncelet, P. and Teisseire, M. (2007), Towards a new approach for mining frequent itemsets on data stream, Journal of intelligent information systems, 28(1), 23-36.
[35]Wang, J., Han, J., and Pei, J. (2003), CLOSET+: searching for the best strategies for mining frequent closed itemsets, International conference on knowledge discovery and data mining, 236-245.
[36]Xu, Y., Yu, J. X., Liu, G. and Lu, H. (2002), From path tree to frequent patterns: A framework for mining frequent patterns, International conference on data mining, 514-521.
[37]Yen, S. J. and Chen, A. L. P. (2001), A graph-based approach for discovering various types of association rules, Transactions on knowledge and data engineering, 13(5), 839-845.
[38]Yen, S. J. and Lee, Y. S. (2002a), Mining time-gap sequential patterns from transaction databases, Journal of computers, 14(2), 30-46.
[39]Yen, S. J., Lee, Y. S. and Chen, S. W. (2002b), Mining quantitative association rules from transaction database, National conference on fuzzy teory and its applications, 520-525.
[40]Yen, S. J., Lee, Y. S., Wang, C. K., Wu, J. W. and Ouyang, L. Y. (2009), The studies of mining frequent patterns based on frequent pattern tree, Pacific-asia conference on knowledge discovery and data mining, lecture notes in artificial intelligence, LNAI 5476, 232-241.
[41]Yu, J., Chong, Z. and Zhou, H. Lu (2004), False positive or false negative: Mining frequent itemsets from high speed transactional data streams, International conference on very large data bases, 204-215.
[42]Zaiane, O. R., El-Hajj, M. and Lu, P. (2001), Fast parallel association rule mining without candidacy generation, International conference on data mining, 665-668.
[43]Zaki, M. J. (2000a), Generation non-redundant association rules, International conference on knowledge discovery in data, 34-43.
[44]Zaki, M. J. (2000b), Scalable algorithms for association mining, Transactions on knowledge and data engineering, 12(3), 372-390
[45]Zaki, M. J. and Hsiao, C. J. (2002), CHARM: An efficient algorithm for closed itemset mining, International conference on data mining, 99-104.
[46]Zhou, Z. H., Jiang, Y. and Chen, S. F. (2000), A General Neural Framework for Classification Rule Mining, International Journal of Computers, System and Signals, 154-168.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE