:::

詳目顯示

回上一頁
題名:時間序列資料庫之封閉性樣式探勘
作者:吳惠雯
作者(外文):Huei-Wen Wu
校院名稱:國立臺灣大學
系所名稱:資訊管理學研究所
指導教授:李瑞庭
學位類別:博士
出版日期:2010
主題關鍵詞:資料探勘封閉性樣式時間序列資料庫Data miningClosed patternTime-series database
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:55
近年來,探勘封閉性樣式已成為知識探索領域中重要的研究議題,其主要目的在於找出隱藏在大量資料中具有代表性的樣式。本論文針對如何從時間序列資料庫中探勘封閉性樣式,提出了三個有效率的演算法,分別為CMP (Closed Multi-sequence Patterns mining)、 CFP (Closed Flexible Patterns mining)與CNP (multi-resolution Closed Numerical Patterns mining)。
CMP演算法主要著重於多時間序列資料庫中樣式的分析,而CFP演算法可從時間序列資料庫中探勘具「彈性間隔」的封閉性樣式。CMP與CFP演算法先將時間序列轉換成符號序列,然後再探勘封閉性樣式。將時間序列轉換成符號序列可減少雜訊並簡化探勘程序,然而可能導致樣式遺失或差異很大的序列卻支持相同樣式的問題。
為避免將時間序列轉換到符號序列所造成的問題,CNP演算法可從多時間序列資料庫中直接找出具有代表性的樣式。此外,CNP演算法加入多層解析度 (multi-resolution)的概念以找出封閉性樣式,可讓使用者以不同的觀點來檢視資料。
上述三個演算法皆以深度優先探勘的方式,並配合投影資料庫以減少搜尋空間,除了加速探勘的過程外,透過有效的修剪策略及檢查封閉性的機制,可避免產生不必要的候選樣式。
本研究結果顯示CMP演算法比改良式Apriori以及BIDE演算法快了數十倍; CFP演算法比改良式Apriori演算法更有效率; CNP演算法執行速度亦較改良式A-Close演算法快。
Closed pattern mining is a critical research issue in the area of knowledge discovery and data mining with the aim of discovering interesting patterns hidden in a large amount of data. In this dissertation, we propose three algorithms, called CMP (Closed Multi-sequence Patterns mining), CFP (Closed Flexible Patterns mining), and CNP (multi-resolution Closed Numerical Patterns mining) to solve various issues extended from the problem of mining closed patterns.
The CMP algorithm is designed to find closed patterns in a multi-sequence time-series database. The CFP algorithm is developed to solve the problem of mining closed flexible patterns in a time-series database. Both the CMP and CFP algorithms involve a transformation of time-series sequences into symbolic sequences in the first phase. Although analyzing on symbolic sequences is ideal to reduce the effect of noises and ease the mining process, these approaches may lead to pattern lost and the sequences supporting the same pattern may look quite different.
To overcome the problem raised in symbolic sequence analysis, the CNP algorithm is proposed to mine closed patterns without any transformation from time-series sequences to symbolic sequences. The method also employs the Haar wavelet transform to discover patterns in the multiple resolutions in order to provide different perspectives on datasets.
All the proposed algorithms have employed the concept of projected databases to localize the pattern extension that leads to a significant runtime improvement. Moreover, effective closure checking schemes and pruning strategies are devised respectively in each of the proposed algorithms to avoid generating redundant candidates.
The experimental results show that the CMP algorithm significantly outperforms the modified Apriori and BIDE algorithms. The CFP algorithm achieves better performance than the modified Apriori algorithm in all cases. And, the CNP algorithm has demonstrated a significant runtime improvement in comparison to the modified A-Close algorithm.
[1] R. Agrawal, K. Lin, H. S. Sawhney, K. Shim, Fast similarity search in the presence of noise, scaling, and translation in time-series databases, in: Proceedings of the 21th International Conference on Very Large Data Bases, 1995, pp. 490-501.new window
[2] C. D. Ahrens, Meteorology today: an introduction to weather, climate, and the environment (8th ed.), Thomson Brooks/Cole, Belmont, 2007.
[3] D. Alter, Liver-function testing, MLO: Medical Laboratory Observer 40 (12) (2008) 10-17.
[4] J. Ayres, J. Gehrke, T. Yiu, J. Flannick, Sequential pattern mining using a bitmap representation, in: Proceedings of the 8th ACM International Conference on Knowledge Discovery and Data Mining, 2002, pp. 429-435.
[5] BBC News, .
[6] D. J. Berndt, J. Clifford, Finding patterns in time series: a dynamic programming approach, Advances in Knowledge Discovery and Data Mining (1st ed.), American Association for Artificial Intelligence, 1996, pp. 229-248.
[7] Central Weather Bureau, .
[8] L. Chang, T. Wang, D. Yang, H. Luan, SeqStream: mining closed sequential patterns over stream sliding windows, in: Proceedings of the 8th IEEE International Conference on Data Mining, 2008, pp. 83-92.
[9] H. Chen, W. Chung, J. J. Xu, G. Wang, Y. Qin, M. Chau, Crime data mining: a general framework and some examples, IEEE Computer 37 (4) (2004) 50-56.
[10] T. S. Chen, S. C. Hsu, Mining frequent tree-like patterns in large datasets, Data and Knowledge Engineering 62 (1) (2007) 65-83.new window
[11] Y. L. Chen, T. C. K. Huang, A novel knowledge discovering model for mining fuzzy multi-level sequential patterns in sequence databases, Data and Knowledge Engineering 66 (3) (2008) 349-367.
[12] Y. Chen, S. Mabu, K. Shimada, and K. Hirasawa, A genetic network programming with learning approach for enhanced stock trading model, Expert Systems with Applications 36 (10) (2009) 12537-12546.
[13] C. J. Chu, V. S. Tseng, T. Liang, Efficient mining of temporal emerging itemsets from data streams, Expert Systems with Applications 36 (1) (2009) 885-893.new window
[14] T. H. Cormen, C. E. Leiserson, R. L. Rivest, C. Stein, Introduction to algorithms (2nd ed.), The MIT Press, Cambridge, 2003.
[15] G. Das, K. Lin, H. Mannila, Rule discovery from time series, in: Proceedings of the 4th International Conference on Knowledge Discovery and Data Mining, 1998, pp. 16-22.
[16] Data Bank for Atmospheric Research, .
[17] C. Faloutsos, M. Ranganathan, Y. Manolopoulos, Fast subsequence matching in time-series databases, ACM SIGMOD Record 23 (2) (1994) 419-429.
[18] J. Han, G. Dong, Y. Yin, Efficient mining of partial periodic patterns in time series database, in: Proceedings of the 15th International Conference on Data Engineering, 1999, pp. 106-115.
[19] J. Han, M. Kamber, Data mining: concepts and techniques (2nd ed.), Morgan Kaufmann, San Francisco, 2006.
[20] J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, M. Hsu, FreeSpan: frequent pattern-projected sequential pattern mining, in: Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000, pp. 355-359.
[21] J. Han, J. Wang, Y. Lu, P. Tzvetkov, Mining top-k frequent closed patterns without minimum support, in: Proceedings of the 2002 IEEE International Conference on Data Mining, 2002, pp. 211-218.
[22] J. W. Huang, C. Y. Tseng, J. C. Ou, M. S. Chen, A general model for sequential pattern mining with a progressive database, IEEE Transactions on Knowledge and Data Engineering 20 (9) (2008) 1153-1167.
[23] Y. Huang, L. Zhang, P. Zhang, A framework for mining sequential patterns from spatio-temporal event data sets, IEEE Transactions on Knowledge and Data Engineering 20 (4) (2008) 433-448.
[24] L. Ji, K. L. Tan, K. H. Tung, Compressed hierarchical mining of frequent closed patterns from dense data sets, IEEE Transactions on Knowledge and Data Engineering 19 (9) (2007) 1175-1187.
[25] E. Keogh, Fast similarity search in the presence of longitudinal scaling in time series database, in: Proceedings of the Ninth International Conference on Tools with Artificial Intelligence, 1997, pp. 578-584.
[26] E. Keogh, S. Kasetty, On the need for time series data mining benchmarks: a survey and empirical demonstration, in: Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002, pp. 102-111.
[27] C. Kim, J. Lim, R. T. Ng, K. Shim, SQUIRE: sequential pattern mining with quantities, The Journal of Systems and Software 80 (10) (2007) 1726-1745.
[28] M. Kontaki, A. N. Papadopoulos, Y. Manolopoulos, Adaptive similarity search in streaming time series with sliding windows, Data and Knowledge Engineering 63 (2) (2007) 478-502.
[29] R. J. Larsen, M. L. Marx, An introduction to mathematical statistics and its applications (3rd ed.), Prentice Hall, New Jersey, 2001.
[30] A. J. T. Lee, C. S. Wang, W. Y. Wang, Y. A. C, H. W. Wu, An efficient algorithm for mining closed inter-transaction itemsets, Data and Knowledge Engineering 66 (1) (2008) 68-91.new window
[31] A. J. T. Lee, Y. T. Wang, Efficient data mining for calling path patterns in GSM networks, Information Systems 28 (8) (2003) 929-948.
[32] A. J. T. Lee, H. W. Wu, T. Y. Lee, Y. H. Liu, K. T. Chen, Mining closed patterns in multi-sequence time-series databases, Data and Knowledge Engineering 68 (10) (2009) 1071-1090.
[33] C. H. L. Lee, A. Liu, W. S. Chen, Pattern discovery of fuzzy time-series for financial prediction, IEEE Transactions on Knowledge and Data Engineering 18 (5) (2006) 613-625.
[34] T. H. Lee, R. Kim, J. T. Benson, T. M. Therneau, L. J. Melton III, Serum aminotransferase activity and mortality risk in a United States community, Hepatology 47 (3) (2008) 880-887.
[35] Y. S. Lee, S. J. Yen, Incremental and interactive mining of web traversal patterns, Information Sciences 178 (2) (2008) 287-306.
[36] H. F. Li, C. C. Ho, S. Y. Lee, Incremental updates of closed frequent itemsets over continuous data streams, Expert Systems with Applications 36 (2) (2009) 2451-2458.
[37] J. Lin, E. Keogh, S. Lonardi, B. Chiu, A symbolic representation of time series, with implications for streaming algorithms, in: Proceedings of the 8th ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2003, pp. 2-11.
[38] M. Y. Lin, S. C. Hsueh, C. W. Chang, Fast discovery of sequential patterns in large databases using effective time-indexing, Information Sciences 178 (22) (2008) 4228-4245.
[39] F. Masseglia, P. Poncelet, M. Teisseire, Efficient mining of sequential patterns with time constraints: reducing the combinations, Expert Systems with Applications 36 (2) (2009) 2677-2690.
[40] F. Masseglia, P. Poncelet, M. Teisseire, Incremental mining of sequential patterns in large databases, Data and Knowledge Engineering 46 (1) (2003) 97-121.new window
[41] F. Mörchen, A. Ultsch, Optimizing time series discretization for knowledge discovery, in: Proceedings of the 11th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2005, pp. 660-665.
[42] Y. Nishi, R. Doering, Handbook of semiconductor manufacturing technology (1st ed.), Marcel Dekker Inc., New York, 2000.
[43] N. Pasquier, Y. Bastide, R. Taouil, L. Lakhal, Discovering frequent closed itemsets for association rules, in: Proceeding of the 7th International Conference on Database Theory, 1999, pp. 398-416.
[44] J. Pei, J. Han, R. Mao, CLOSET: an efficient algorithm for mining frequent closed itemsets, in: Proceedings of the 5th ACM-SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery, 2000, pp. 21-30.
[45] J. Pei, J. Han, B. Mortazavi-Asl, H. Pinto, PrefixSpan: mining sequential patterns efficiently by prefix-projected pattern growth, in: Proceedings of the 17th International Conference on Data Engineering, 2001, pp. 215-224.
[46] W. C. Peng, Z. X. Liao, Mining sequential patterns across multiple sequence databases, Data and Knowledge Engineering 68 (10) (2009) 1014-1033.
[47] D. Perera, J. Kay, I. Koprinska, K. Yacef, O. R. Zaiane, Clustering and sequential pattern mining of online collaborative learning data, IEEE Transactions on Knowledge and Data Engineering 21 (6) (2009) 759-772.
[48] P. J. Pockros, E. R. Schiff, M. L. Shiffman, J. G. McHutchison, R. G. Gish, N. H. Afdhal, M. Makhviladze, M. Huyghe, D. Hecht, T. Oltersdorf, D. A. Shapiro, Oral IDN-6556, an antiapoptotic caspase inhibitor, may lower aminotransferase activity in patients with chronic hepatitis C, Hepatology 46 (2) (2007) 324-329.
[49] A. H. Ritchie, D. M. Williscroft, Elevated liver enzymes as a predictor of liver injury in stable blunt abdominal trauma patients: case report and systematic review of the literature, Canadian Journal of Rural Medicine 11 (4) (2006) 283-287.
[50] S. Russell, A. Gangopadhyay, V. Yoon, Assisting decision making in the event-driven enterprise using wavelets, Decision Support Systems 46 (1) (2008) 14-28.new window
[51] S. R. Song, W. Y. Ku, Y. L. Chen, Y. C. Lin, C. M. Liu, L. W. Kuo, T. F. Yang, H. J. Lo, Groundwater chemical anomaly before and after the Chi-Chi Earthquake in Taiwan, Terrestrial, Atmospheric and Oceanic Sciences 14 (3) (2003) 311-320.
[52] R. Srikant, R. Agrawal, Mining sequential patterns, in: Proceedings of the 11th International Conference on Data Engineering, 1995, pp. 3-14.
[53] R. Srikant, R. Agrawal, Fast algorithms for mining association rules, in: Proceedings of the 20th International Conference Very Large Data Bases, 1994, pp. 487-499.
[54] R. Srikant, R. Agrawal, Mining sequential patterns: generalizations and performance improvements, in: Proceedings of the 5th International Conference on Extending Database Technology, 1996, pp. 3-17.
[55] Standard and Poor''s, .
[56] Stocks on Wall Street, .
[57] Taiwan Stock Exchange Corporation, .
[58] J. I. Takeuchi, K. Yamanishi, A unifying framework for detecting outliers and change points from time series, IEEE Transactions on Knowledge and Data Engineering 18 (4) (2006) 482-492.
[59] H. J. Teoh, C. H. Cheng, H. H. Chu, J. S. Chen, Fuzzy time series model based on probabilistic approach and rough set rule induction for empirical research in stock markets, Data and Knowledge Engineering 67 (1) (2008) 103-117.new window
[60] C. S. Wang, A. J. T. Lee, Mining inter-sequence patterns, Expert Systems with Applications 36 (4) (2009) 8649-8658.
[61] J. Wang, J. Han, BIDE: efficient mining of frequent closed sequences, in: Proceedings of the 20th International Conference on Data Engineering, 2004, pp. 79-90.
[62] J. Wang, J. Han, J. Pei, CLOSET+: searching for the best strategies for mining frequent closed itemsets, in: Proceedings of the 9th ACM International Conference on Knowledge Discovery and Data Mining, 2003, pp. 236-245.
[63] Y. Wang, E. P. Lim, S. Y. Hwang, Efficient mining of group patterns from user movement data, Data and Knowledge Engineering 57 (3) (2006) 240-282.
[64] H. W. Wu, A. J. T. Lee, Mining closed flexible patterns in time-series databases, Expert Systems with Applications 37 (3) (2010) 2098-2107.
[65] Yahoo Finance, .
[66] X. Yan, J. Han, R. Afshar, CloSpan: mining closed sequential patterns in large databases, in: Proceedings of the 2003 SIAM International Conference on Data Mining, 2003, pp. 166-177.
[67] T. Q. Yang, A time series data mining based on ARMA and MLFNN model for intrusion detection, Journal of Communication and Computer 3 (7) (2006) 16-22.
[68] D. Yuan, K. Lee, H. Cheng, G. Krishna, Z. Li, X. Ma, Y. Zhou, J. Han, CISpan: comprehensive incremental mining algorithms of closed sequential patterns for multi-versional software mining, in: Proceedings of the 2008 SIAM International Conference on Data Mining, 2008, pp. 84-95.
[69] M. J. Zaki, SPADE: an efficient algorithm for mining frequent sequences, Machine Learning 42 (1) (2001) 31-60.new window
[70] M. J. Zaki, C. Hsiao, Efficient algorithms for mining closed itemsets and their lattice structure, IEEE Transactions on Knowledge and Data Engineering 17 (4) (2005) 462-478.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關博士論文
 
無相關書籍
 
無相關著作
 
無相關點閱
 
QR Code
QRCODE