:::

詳目顯示

回上一頁
題名:適用於巨量資料分析的約略集合規則歸納法
作者:范有寧
作者(外文):Yu-Neng Fan
校院名稱:臺灣大學
系所名稱:資訊管理學研究所
指導教授:陳靜枝
學位類別:博士
出版日期:2013
主題關鍵詞:約略集合規則歸納法增量式演算法巨量資料資料探勘Rough Set TheoryRule InductionIncremental AlgorithmBig DataData Mining
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:10
約略集合規則歸納法是一種適用於處理不確定且不完整數據的科學方法,可透過對數據的分析和推理來發現隱含的知識、揭示潛在的規則,且不需要額外的統計的假設。此種分析工具於近年來受到許多矚目,並也廣泛且成功運用於許多領域中。
然而,近年來企業及實務界皆面臨巨量資料所帶來的衝擊,當系統建置於處理營運所產生的交易數據及資料時,資料會於短暫時間內大量增加並累積,其增加的量及速率都超出現有分析工具所能處理的範圍。此外,以資料集維度來觀察,我們發現於資料集中,並非只有物件會在短期間內大量增加,屬性維度亦有相同趨勢。為因應此趨勢,本研究提出一適用於巨量資料分析的增量式約略集合規則歸納法,此模型考量資料集中物件增加及屬性增加兩種維度的議題。可有效運用增量式演算法的特性,有效率的更新規則且節省大量計算時間。
本研究以台灣知名的電視購物台資料為例,實行的結果顯示,本研究所提出的增量式規則歸納法能於短時間內因應新增資料有效更新規則,其效率及分類的正確率及覆蓋率都較傳統方法優異。此結果說明增量式規則歸納法可作為企業處理巨量資料分析時的解決方案,其所產生的規則更可作為企業決策支援及策略評估的重要指標。
Rough set-based rule induction is able to generate decision rules from a database and has mechanisms to handle noise and uncertainty in data. Using these meaningful decision rules, the technique facilitates managerial decision-making. However, databases are used to run the day-to-day operations of a business must process quickly. Large volumes of data are continually updatedwithin a short period of time. The infrastructure required to analyze such large amounts of data must be able to support a deeper analysis, to deal with extreme data volumes, to allow faster response times, and to automate decisions based on analytical models.
This study proposed a rough set-based rule induction approach with consideration of both incremental objects and attributes. It is able to deal with the big data issue for rule induction while the data are incrementally added into the dataset. The method eliminates the necessity to re-compute the entire dataset when the database is updated. As a result, huge amounts of computation time and memory space are saved. The proposed model is composed of five main steps: case determination, reduct generation, significance calculation, rule induction, and rule tuning.
A case study of a Home shopping company is used to show the validity and efficiency of this method. The results show that the proposed model considerably reduces the computing time for inducing decision rules, while maintaining the quality of the rules.Since this subject has rarely been the subject of previous study, it is believed that this study will form the basis for the solution of many other similar problems of big data analytics.
References
[1]Al-Qaheri, H., Hassanien, A., and Abraham, A., “A generic scheme for generating prediction rules using rough sets,” in Rough Set Theory: A True Landmark in Data Analysis. Vol. 174, Abraham, A., Falcón, R., and Bello, R., Eds.: Springer Berlin Heidelberg, 2009, pp. 163-186.
[2]BakIrlI, G., Birant, D., and Kut, A., “An incremental genetic algorithm for classification and sensitivity analysis of its parameters,”Expert Systems with Applications,Vol. 38,No. 3, 2011, pp. 2609-2620.
[3]Banka, H. and Mitra, S., “Feature selection, classification and rule generation using rough sets,” in Rough Sets: Selected Methods and Applications in Management and Engineering, Peters, G., Lingras, P., Ślęzak, D., and Yao, Y., Eds.: Springer London, 2012, pp. 51-76.
[4]Bazan, J. and Szczuka, M., “RSES and RSESlib - A Collection of Tools for Rough Set Computations,”The Second International Conference on Rough Sets and Current Trends in Computing, 2001, pp. 106-113.
[5]Bazan, J., Szczuka, M., Wojna, A., and Wojnarski, M., “On the evolution of rough set exploration system,” in Rough Sets and Current Trends in Computing. Vol. 3066, Tsumoto, S., Słowiński, R., Komorowski, J., and Grzymała-Busse, J., Eds.: Springer Berlin Heidelberg, 2004, pp. 592-601.
[6]Bazan, J. G., Nguyen, H. S., Nguyen, S. H., Synak, P., and Wroblewski, J., “Rough set algorithms in classification problem,” in Rough set methods and applications: Physica-Verlag GmbH, 2000, pp. 49-88.
[7]Bizer, C., Boncz, P., Brodie, M. L., and Erling, O., “The meaningful use of big data: Four perspectives- Four challenges,”ACM SIGMOD Record,Vol. 40,No. 4, 2011, pp. 56-60.
[8]Bohn, R. and Short, J., “How much information? 2009: Report on American consumers,” University of California, San Diego, Global Information Industry Center, 2010.
[9]Bughin, J., Chui, M., and Manyika, J., “Clouds, big data and smart assets: Ten tech-enabled business trends to watch,”McKinsey Quarterly,Vol. 10,No. 3, 2010, p. 1.
[10]Burke, R., “The Third Wave of Marketing Intelligence,” in Retailing in the 21st Century, Krafft, M. and Mantrala, M., Eds.: Springer Berlin Heidelberg, 2006, pp. 113-125.
[11]Chakhar, S. and Saad, I., “Dominance-based rough set approach for groups in multicriteria classification problems,”Decision Support Systems,Vol. 54,No. 1, 2012, pp. 372-380.
[12]Chen, C. Y., Hwang, S. C., and Oyang, Y. J., “A statistics-based approach to control the quality of subclusters in incremental gravitational clustering,”Pattern Recognition,Vol. 38,No. 12, 2005, pp. 2256-2269.
[13]Cheng, W. Y. and Juang, C. F., “An incremental support vector machine-trained TS-type fuzzy system for online classification problems,”Fuzzy Sets and Systems,Vol. 163,No. 1, 2011, pp. 24-44.
[14]Cheng, Y., “The incremental method for fast computing the rough fuzzy approximations,”Data & Knowledge Engineering,Vol. 70,No. 1, 2011, pp. 84-100.
[15]Cheung, D. W., Jiawei, H., Ng, V. T., and Wong, C. Y., “Maintenance of discovered association rules in large databases: an incremental updating technique,”Proceedings of the 12 International Conference on Data Engineering, 1996, pp. 106-114.
[16]Crespo, F. and Weber, R., “A methodology for dynamic data mining based on fuzzy clustering,”Fuzzy Sets and Systems,Vol. 150,No. 2, 2005, pp. 267-284.
[17]Daubie, M., Levecq, P., and Meskens, N., “A comparison of the rough sets and recursive partitioning induction approaches: An application to commercial loans,”International Transactions in Operational Research,Vol. 9,2002, pp. 681-694
[18]Davenport, T. H., Barth, P., and Bean, R., “How big data is different,”MIT Sloan Management Review,Vol. 54,No. 1, 2012, p. 43.
[19]Demirkan, H. and Delen, D., “Leveraging the capabilities of service-oriented decision support systems: Putting analytics and big data in cloud,”Decision Support Systems,No. In Press, Corrected Proof, Available online 29 May 2012.,
[20]Ding, S. Q. and Xiang, C., “Overfitting problem: a new perspective from the geometrical interpretation of MLP,” in Design and application of hybrid intelligent systems, Ajith, A., Mario, K., ppen, and Katrin, F., Eds.: IOS Press, 2003, pp. 50-57.
[21]Erdogan, Z., “Celebrity Endorsement: A Literature Review,”Journal of Marketing Management,Vol. 15,No. 4, 1999, pp. 291-314.
[22]Fan, Y.-N., Tseng, T.-L., Chern, C.-C., and Huang, C.-C., “Rule induction based on an incremental rough set,”Expert Systems with Applications,Vol. 36,No. 9, 2009, pp. 11439-11450.
[23]Fan, Y. N., Tseng, T. L., Chern, C. C., and Huang, C. C., “Rule induction based on an incremental rough set,”Expert Systems with Applications,Vol. 36,No. 9, 2009, pp. 11439-11450.
[24]Fong, J., Wong, H. K., and Huang, S. M., “Continuous and incremental data mining association rules using frame metadata model,”Knowledge-Based Systems,Vol. 16,No. 2, 2003, pp. 91-100.
[25]Fu, L. M., “Knowledge discovery based on neural networks,”Communications of the ACM,Vol. 42,No. 11, 1999, pp. 47-50.
[26]Gantz, J. and Reinsel, D., “The Digital Universe Decade: Are you ready?,” International Data Corporation (IDC), 2010.
[27]Gharib, T. F., Nassar, H., Taha, M., and Abraham, A., “An efficient algorithm for incremental mining of temporal association rules,”Data & Knowledge Engineering,Vol. 69,No. 8, 2010, pp. 800-815.
[28]Grzymala-Busse, J. W., “A new version of the rule induction system LERS,”Fundam. Inf.,Vol. 31,No. 1, 1997, pp. 27-39.
[29]Han, J. and Kamber, M., Data Mining: Concepts and Techniques, 2 Ed.: Morgan Kaufmann Publishers, 2006.
[30]Hsiao, W. F. and Chang, T. M., “An incremental cluster-based approach to spam filtering,”Expert Systems with Applications,Vol. 34,No. 3, 2008, pp. 1599-1608.
[31]Hu, K., Lu, Y., and Shi, C., “Feature ranking in rough sets,”AI Communications,Vol. 16,No. 1, 2003, pp. 41-50.
[32]Huang, C. M., Hsu, J. M., Lai, H. Y., Huang, D. T., and Pong, J. C., “An estelle-based incremental protocol design system,”Journal of Systems and Software,Vol. 36,No. 2, 1997, pp. 115-135.
[33]Ichihashi, H., Shirai, T., Nagasaka, K., and Miyoshi, T., “Neuro-fuzzy ID3: a method of inducing fuzzy decision trees with linear programming for maximizing entropy and an algebraic method for incremental learning,”Fuzzy Sets and Systems,Vol. 81,No. 1, 1996, pp. 157-167.
[34]Jacobs, A., “The pathologies of big data,”Communications of the ACM,Vol. 52,No. 8, 2009, pp. 36-44.
[35]Jaworski, W., “Rule induction: Combining rough set and statistical approaches,”Proceedings of the 6th International Conference on Rough Sets and Current Trends in Computing, 2008, pp. 170-180.
[36]Jiang, Y. and Du, B., “A efficiency complete algorithm for attribute reduction,”The 2nd IEEE International Conference on Computer Science and Information Technology, 2009, pp. 377-379.
[37]Kusiak, A., “Feature transformation methods in data mining,”IEEE Transactions on Electronics Packaging Manufacturing,Vol. 24,No. 3, 2001, pp. 214-221.
[38]Lühr, S. and Lazarescu, M., “Incremental clustering of dynamic data streams using connectivity based representative points,”Data & Knowledge Engineering,Vol. 68,No. 1, 2009, pp. 1-27.
[39]Lavalle, S., Lesser, E., Shockley, R., Hopkins, M. S., and Kruschwitz, N., “Big data, analytics and the path from insights to value,”MIT Sloan Management Review,Vol. 52,No. 2, 2011, pp. 21-32.
[40]Li, H.-L. and Chen, M.-H., “Induction of multiple criteria optimal classification rules for biological and medical data,”Computers in Biology and Medicine,Vol. 38,No. 1, 2008, pp. 42-52.
[41]Li, T. R., Ruan, D., Geert, W., Song, J., and Xu, Y., “A rough sets based characteristic relation approach for dynamic attribute generalization in data mining,”Knowledge Based Systems,Vol. 20,No. 5, 2007, pp. 485-494.
[42]Li, W. and Shang, Y., “Study of Weight Determination Based on Rough Set Theory in Desulfuration Projects Decision,”IEEE International Symposium on Knowledge Acquisition and Modeling Workshop, 2008, pp. 305-307.
[43]Liang, W.-Y. and Huang, C.-C., “Agent-based demand forecast in multi-echelon supply chain,”Decision Support Systems,Vol. 42,No. 1, 2006, pp. 390-407.
[44]Lin, M. Y. and Lee, S. Y., “Incremental update on sequential patterns in large databases by implicit merging and efficient counting,”Information Systems,Vol. 29,No. 5, 2004, pp. 385-404.
[45]Lin, S. H. and Yuan, Y. S., “Eastern Home Shopping Network,” Market Daily, Taipei 2012.
[46]Liu, D., Li, T. R., Ruan, D., and Zou, W. L., “An incremental approach for inducing knowledge from dynamic information systems,”Fundamenta Informaticae,Vol. 94,No. 2, 2009, pp. 245-260.
[47]Mak, B. and Munakata, T., “Rule extraction from expert heuristics: A comparative study of rough sets with neural networks and ID3,”European Journal of Operational Research,Vol. 136,No. 1, 2002, pp. 212-229.
[48]Makosa, E., "Rule Tuning," Master, Department of Information Technology, Uppsala University 2005.
[49]Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., and Byers, A. H., “Big data: The next frontier for innovation, competition, and productivity,” McKinsey Global Institute, 2011.
[50]Masseglia, F., Poncelet, P., and Teisseire, M., “Incremental mining of sequential patterns in large databases,”Data & Knowledge Engineering,Vol. 46,No. 1, 2003, pp. 97-121.
[51]Mela, C., Gupta, S., and Lehmann, D., “The Long Term Impact of Promotion and Advertising on Consumer Brand Choice,”Journal of Marketing research
Vol. 34,No. 2, 1997, pp. 248-261.
[52]Nguyen, H. S., “On the decision table with maximal number of reducts,”Electronic Notes in Theoretical Computer Science,Vol. 82,No. 4, 2003, pp. 198-205.
[53]Ning, H., Xu, W., Chi, Y., Gong, Y., and Huang, T. S., “Incremental spectral clustering by efficiently updating the eigen-system,”Pattern Recognition,Vol. 43,No. 1, 2010, pp. 113-127.
[54]Novikov, B., Vassilieva, N., and Yarygina, A., “Querying big data,”Proceedings of the 13th International Conference on Computer Systems and Technologies, Ruse, Bulgaria, 2012, pp. 1-10.
[55]Ohrn, A. and Rowland, T., “Rough sets: a knowledge discovery technique for multifactorial medical outcomes,”American Journal of Physical Medicine & Rehabilitation,Vol. 79,No. 1, 2000, pp. 100-108.
[56]Otey, M. E., Parthasarathy, S., Wang, C., Veloso, A., and Meira, W., “Parallel and distributed methods for incremental frequent itemset mining,”IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics,Vol. 34,No. 6, 2004, pp. 2439-2450.
[57]Pawlak, Z., “Rough sets,”International Journal of Information and Computer Sciences,Vol. 11,No. 5, 1982, pp. 341-356.
[58]Pawlak, Z., “Rough sets: Theoretical aspects of reasoning about data,” in Rough Sets: Theoretical Aspects of Reasoning about Data, Boston: Kluwer Academic Publishers, 1991.
[59]Pawlak, Z., “Rough set approach to knowledge-based decision support,”European Journal of Operational Research,Vol. 99,No. 1, 1997, pp. 48-57.
[60]Pawlak, Z. and Skowron, A., “Rudiments of rough sets,”Information Sciences,Vol. 177,No. 1, 2007, pp. 3-27.
[61]Quinlan, J. R., “Induction of Decision Trees,”Machine Learning,Vol. 1,No. 1, 1986, pp. 81-106.
[62]Rogers, S., “Big Data is Scaling BI and Analytics,” Information Management and SourceMedia Inc., 2011.
[63]Schadt, E. E., Linderman, M. D., Sorenson, J., Lee, L., and Nolan, G. P., “Computational solutions to large-scale data management and analysis,”Nature Reviews Genetics,Vol. 11,No. 9, 2010, pp. 647-657.
[64]Shen, Q. and Chouchoulas, A., “A modular approach to generating fuzzy rules with reduced attributes for the monitoring of complex systems,”Engineering Applications of Artificial Intelligence,Vol. 13,No. 3, 2000, pp. 263-278.
[65]Skowron, A. and Rauszer, C., “The discernibility matrices and functions in information systems,” in Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory, Slowinski, R., Ed., Dordrecht, Netherlands: Kluwer Academic Publishers, 1992, pp. 331-362.
[66]Stonebraker, M. and Hong, J., “Researchers’ big data crisis: understanding design and functionality,”Communications of the ACM,Vol. 55,No. 2, 2012, pp. 10-11.
[67]Sun, C. M., Liu, D. Y., Sun, S. Y., Li, J. F., and Zhang, Z. H., “Containing order rough set methodology,”Proceedings of 2005 International Conference on Machine Learning and Cybernetics, Guangzhou, China, 2005, pp. 1722-1727.
[68]Toffler, A., “Revolutionary Wealth,”New Perspectives Quarterly,Vol. 23,No. 3, 2006, pp. 7-15.
[69]Trelles, O., Prins, P., Snir, M., and Jansen, R. C., “Big data, but are we ready?,”Nature Reviews Genetics,Vol. 12,No. 3, 2011, p. 1.
[70]Tseng, T.-L., Huang, C.-C., and Fan, Y.-N., “Autonomous rule induction from data with tolerances in customer relationship management,”Expert Systems With Applications,Vol. 38,No. 5, 2011, pp. 4889-4900.
[71]Tseng, T. L. and Huang, C. C., “Rough set-based approach to feature selection in customer relationship management,”OMEGA - The International Journal of Management Science,Vol. 35,No. 4, 2007, pp. 365-383.
[72]Tseng, T. L., Huang, C. C., and Fan, Y. N., “Autonomous rule induction from data with tolerances in customer relationship management,”Expert Systems with Applications,Vol. 38,No. 5, 2011, pp. 4889-4900.
[73]Tsumoto, S., “Accuracy and coverage in rough set rule induction,” in Rough Sets and Current Trends in Computing. Vol. 2475, Alpigini, J., Peters, J., Skowron, A., and Zhong, N., Eds.: Springer Berlin Heidelberg, 2002, pp. 950-950.
[74]Wang, C. Y., Tseng, S. S., and Hong, T. P., “Flexible online association rule mining based on multidimensional pattern relations,”Information Sciences,Vol. 176,No. 12, 2006, pp. 1752-1780.
[75]Wang, G. Y., Zhao, J., An, J. J., and Wu, Y., “Theoretical study on attribute reduction of rough set theory: comparison of algebra and information views,”Proceedings of the 3th IEEE International Conference on Cognitive Informatics, 2004, pp. 148-155.
[76]Wang, X., Yang, J., Jensen, R., and Liu, X., “Rough set feature selection and rule induction for prediction of malignancy degree in brain glioma,”Computer Methods and Programs in Biomedicine,Vol. 83,No. 2, 2006, pp. 147-156.
[77]Winer, R., “A Reference Price Model of Brand Choice for Frequently Purchased Products,”Journal of Consumer Research,Vol. 13,No. 2, 1986, pp. 250-256.
[78]Wróblewski, J., “Genetic algorithms in decomposition and classification problem,” in Rough Sets in Knowledge Discovery 2: Applications, Case Studies and Software Systems, Polkowski, L. and Skowron, A., Eds., Heidelberg: Physica-Verlag, 1998, pp. 471-487.
[79]Xu, Y., Wang, L., and Zhang, R., “A dynamic attribute reduction algorithm based on 0-1 integer programming,”Knowledge-Based Systems,Vol. 24,No. 8, 2011, pp. 1341-1347.
[80]Ye, N. and Li, X., “A scalable, incremental learning algorithm for classification problems,”Computers & Industrial Engineering,Vol. 43,No. 4, 2002, pp. 677-692.
[81]Zhang, J., Wang, J., Li, D., He, H., and Sun, J., “A New Heuristic Reduct Algorithm Base on Rough Sets Theory.” Vol. 2762, Dong, G., Tang, C., and Wang, W., Eds.: Springer Berlin Heidelberg, 2003, pp. 247-253.
[82]Zhu, F. and Guan, S., “Ordered incremental training for GA-based classifiers,”Pattern Recognition Letters,Vol. 26,No. 14, 2005, pp. 2135-2151.
[83]Ziarko, W. and Rijsbergen, K., Rough Sets, Fuzzy Sets and Knowledge Discovery. New York: Springer-Verlag, 1994.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE