:::

詳目顯示

回上一頁
題名:資源有限下的決策樹建構
作者:吳家齊 引用關係
作者(外文):Chia-Chi Wu
校院名稱:國立中央大學
系所名稱:資訊管理研究所
指導教授:陳彥良
學位類別:博士
出版日期:2010
主題關鍵詞:決策樹資料探勘分類成本感知學習data miningcost-sensitive learningdecision treeclassification
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:29
分類是資料探勘中一個非常重要的研究領域。在現存的許多分類器當中,決策樹可能是最受歡迎、也最常被使用的分類模型。現有的大多數決策樹演算法皆致力於將分類精確度最大化、將分類錯誤率最小化。然而,在許多現實生活應用中,從以現有資料建立決策樹,到用決策樹分類未來資料的每個過程,都可能包含了各式各樣不同種類的成本或資源消耗。依據我們所面對的問題,我們也有可能需要在有限的資源底下完成分類工作。因此,如何在資源有限下建立出最適用的決策樹是一個很重要的議題。在本研究中,我們首先提出了兩個改良自傳統TDIDT﹝Top-Down Induction on Decision Trees, 由上而下的決策樹建構﹞的演算法。接著,我們採用了一個全新的方法來處理多種資源限制的問題。我們所提出的新方法先從訓練資料集中粹取出所有合法的分類規則,再利用這些粹取出的規則建出一棵決策樹。我們使用實際資料來進行完整的實驗評估。實驗結果顯示,我們提出的方法在不同資源限制下的表現都是令人滿意的。
Classification is one of the most important research domains in data mining. Among the existing classifiers, decision trees are probably the most popular and commonly-used classification models. Most of the decision tree algorithms aimed to maximize the classification accuracy and minimize the classification error. However, in many real-world applications, there are various types of cost or resource consumption involved in both the induction of decision tree and the classification of future instance. Furthermore, the problem we face may require us to complete a classification task with limited resource. Therefore, how to build an optimum decision tree with resource constraint becomes an important issue. In this study, we first propose two algorithms which are improved versions of traditional TDIDT(Top-Down Induction on Decision Trees) algorithms. Then, we adopt a brand new approach to deal with multiple resource constraints. This approach extracts association classification rules from training dataset first, and then builds a decision tree from the extracted rules. Empirical evaluations were carried out using real datasets, and the results indicated that the proposed methods can achieve satisfactory results in handling data under different resource constraints.
[1]A. Arnt and S. Zilberstein, “Learning Policies for Sequential Time and Cost Sensitive Classification”, Proceedings of the 1st international workshop on Utility-based data mining, pp. 39-45, 2005.
[2]L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone, Classification and Regression Trees, Wadsworth, Belmont, CA, 1984.
[3]X. Chai, L. Deng, Q. Yang, and C. X. Ling, “Test-Cost Sensitive Naïve Bayesian Classification”, Proceeding of the 2004 IEEE International Conference on Data Mining 2004.
[4]P. Chan and S. Stolfo, “Toward Scalable Learning with Non-Uniform Class and Cost Distributions”, Proc. 4th Intl. Conf. on Knowledge Discovery and Data Mining, pp. 164-168, New York, 1998.
[5]P. Domingos, “MetaCost: A General Method for Making Classifiers Cost-Sensitive.”, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164, 1999.
[6]C. Elkan, “The Foundations of Cost-Sensitive Learning”, Proceedings of the Seventeenth International Joint Conference on Artificial Intelligence, pp. 973-978, Seattle, 2001.
[7]J. Gehrke, R. Ramakrishnan, and V. Ganti, “Rainforest: A framework for fast decision tree construction of large datasets”, Proceeding of the 1998 International Conference on Very Large Data Bases, pp. 416-427, 1998.
[8]J. Gehrke, V. Ganti, R. Ramakrishnan, and W. Y. Loh, “BOAT-optimistic decision tree construction”, Proceeding of the 1999 ACM-SIGMOD International Conference. Management of Data, pp. 311-323, 1999.
[9]J. Han and M. Kamber, Data Mining: Concepts and Techniques, Morgan Kaufmann, 2006.
[10]M T. Kai, “Inducing cost-sensitive trees via instance weighting”, Principles of Data Mining and Knowledge Discovery, Second European Symposium, pp. 23-26, Springer-Verlag, 1998.
[11]X. B. Li, “A scalable decision tree system and its application in pattern recognition and intrusion detection”, Decision Support Systems, 41, pp. 112-130, 2005.
[12]C.X. Ling, Q. Yang, J. Wang, and S. Zhang, “Decision Trees with Minimal Costs”, Proceedings of 2004 International Conference on Machine Learning, pp. 69, 2004.
[13]C. X. Ling, V. S. Sheng, and Q. Yang, “Test Strategies for Cost-Sensitive Decision Tree”, IEEE Transactions on Knowledge and Data Engineering, 18(8), pp. 1055-1067, 2006.
[14]C. X. Ling and V. S. Sheng, “Cost-Sensitive Learning and The Class Im-Balance Problem”, Encyclopedia of Machine Learning, Springer, 2008.
[15]W. Y. Loh and Y. S. Shih, “Split selection methods for classification trees”, Statistica Sinica, 7(4), pp. 815-840, 1997.
[16]M. Metha, J. Rissanen, and R. Agrawal, “MDL-based decision tree pruning”, Proceedings of the 1995 International Conference on knowledge Discovery and Data Mining (KDD’95), pp. 216-221, 1995.
[17]M. Metha, R. Agrawal, and J. Rissanen, “SLIQ: A fast scalable classifier for data mining”, Proceedings of the 5th International Conference on Extending Database Technology, 1996.
[18]A. Ni, X. Zhu, and C. Zhang, “Any-Cost Discovery: Learning Optimal Classification Rules”, Australian joint conference on artificial intelligence, 3809, pp. 123-132, Sydney, Australie, 2005.
[19]S. W. Norton, “Generating Better Decision Trees.” Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, pp. 800-805, Detroit, Michigan, 1989.
[20]M. Núñez, “The Use of Background Knowledge in Decision Tree Induction.” Machine Learning, 6, pp. 231-250, 1991.
[21]A. Papagelis and D. Kalles, “GATree: Genetically Evolved Decision Trees”, 12th IEEE International Conference on Tools with Artificial Intelligence (ICTAI’00), pp. 203-206, 2000.
[22]F. Provost, T. Fawcett, and R. Kohavi, “The Case Against Accuracy Estimation for Comparing Induction Algorithms”, Proc. 15th Intl. Conf. in Machine Learning, pp. 445-453, Madison, WI, 1998.
[23]Z. Qin, S. Zhang, and C. Zhang, “Cost-Sensitive Decision Trees with Multiple Cost Scales.” Australian joint conference on artificial intelligence, 3339, pp. 380-390, Cairns , Australie, 2004.
[24]J. R. Quinlan, “Induction of decision trees”, Machine Learning, 1, pp. 81-106, 1986.
[25]J. R. Quinlan, “Simplifying decision trees”, International Journal of Man-Machine Studies, 27, pp. 221-234, 1987.
[26]J. R. Quinlan, C4.5: Programs for Machine Learning, San Mateo, Morgan Kaufmann, 1993.
[27]R. Rastogi and K. Shim, “Public: A decision tree classifier that integrates building and pruning”, Proceedings of the 1998 International Conference on Very Large Data Bases, pp. 404-415, 1998.
[28]J. Shafer, R. Agrawal, and M. Mehta. “SPRINT: A scalable parallel classifier for data mining”, Proceedings of 1996 International Conference on Very Large Data Bases, pp. 544-555, 1996.
[29]V. S. Sheng and C. X. Ling, “Feature Value Acquisition in Testing: A Sequential Batch Test Algorithm” Proceedings of the 23nd International Conference on Machine Learning, pp. 809-816, 2006.
[30]M. Tan and J. Schlimmer, “Cost-Sensitive Concept Learning of Sensor Use in Approach and Recognition”, Proceedings of the Sixth International Workshop on Machine Learning, pp. 392-395, Ithaca, New York, 1989.
[31]M. Tan, “Cost-Sensitive Learning of Classification Knowledge and Its Applications in Robotics”, Machine Learning, 13, pp. 7-33, 1993.
[32]K. M. Ting, “An Instance-Weighting Method to Induce Cost-Sensitive Trees.” IEEE Transactions on Knowledge and Data Engineering, 14(3), pp. 659-665, 2002.
[33]P. D. Turney, “Cost-Sensitive Classification: Empirical Evaluation of A Hybrid Genetic Decision Tree Induction Algorithm”, Journal of Artificial Intelligence Research, 2, pp. 369-409, 1995.
[34]P. D. Turney, “Types of Cost in Inductive Concept Learning”, Workshop on Cost-Sensitive Learning at the Seventeenth International Conference on Machine Learning, pp. 15-21, 2000.
[35]Q. Yang, C. Ling, X. Chai, and R. Pan, “Test-Cost Sensitive Classification on Data with Missing Values”, IEEE Transactions on Knowledge and Data Engineering, 18(5), pp. 626-638, 2006.
[36]S. Zhang, Z. Qin, C. X. Ling, and S. Sheng, “ “Missing is Useful”: Mining Values in Cost-Sensitive Decision Trees”, IEEE Transactions on Knowledge and Data Engineering, 17(12), pp. 1689-1693, 2005.
[37]S. Zhang, X. Zhu, J. Zhang, and C. Zhang, “Cost-Time Sensitive Decision Tree with Missing Values”, KSEM 2007, pp. 447-459, 2007.
[38]H. Zhao, “A Multi-Objective Genetic Programming Approach to Developing Pareto Optimal Decision Trees”, Decision Support System, 43(3), pp. 809-826, 2007.
[39]H. Zhao, “Instance Weighting Versus Threshold Adjusting for Cost-Sensitive Classification.” Knowledge Information System, 15(3), pp. 321-334, 2008.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關書籍
 
無相關著作
 
無相關點閱
 
QR Code
QRCODE