

作者(外文):Chang-Ling Hsu
主題關鍵詞:資料挖掘多值屬性多標籤分類決策樹multiple labelsclassificationdecision treedata miningMulti-valued attribute
原始連結:連回原系統網址new window
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:33
現今,決策樹分類法要求屬性及類別標籤均須為單值。然而,真實世界存在著多值多標籤的資料,為了能處理此種多值多標籤資料的分類,本研究首先設計了一個決策樹分類法並命名為MMC (Multi-valued and Multi-labeled Classifier);其次,藉由重新設計此演算法,我們發展另一個分類法並命名為MMDT (Multi-valued and Multi-labeled Decision Tree) 以改善 MMC 的正確率。
MMC 和 MMDT不同於傳統決策樹分類法的一些主要功能,包括生長決策樹、選擇屬性、以標籤代表葉節點及預測新的資料。MMC的發展策略主要基於多標籤間的相似度測量,而MMDT 的發展策略主要暨基於多標籤間的相似度測量及評分。
實驗結果說明 MMC 和 MMDT 不僅能從大量的多值及多標籤資料集來挖掘出規則,而且得到具說服性的正確率和規則良好度。
Presently, decision tree classifiers require that attributes and class label of data set to be single-valued. However, there exist classification problems with multi-valued and multi-labeled data. Aiming to handle this multi-valued and multi-labeled data, this research has developed a decision tree classifier named MMC (Multi-valued and Multi-labeled Classifier) first. Then, by redesigning the algorithm, this research has further developed another classifier named MMDT (Multi-valued and Multi-labeled Decision Tree) to improve the accuracy of MMC.
MMC and MMDT are different from the traditional decision tree classifiers in some major functions including growing a decision tree, selecting attribute, assigning labels to represent a leaf and making a prediction for a new data. The development strategy of MMC is mainly based on measuring similarity among multiple labels; the development strategy of MMDT is mainly based on both measuring similarity and scoring among multiple labels.
The experimental results show that MMC and MMDT can not only mine classification rules from a large multi-valued and multi-labeled data set, but also get convincing accuracy and goodness of rules.
Adams, W. J. and Yellen, J. L. (1976). Commodity Bundling and the Burden of Monopoly. Quarterly Journal of Economics, 90(3), 475-498.
Agrawal, R., Ghosh, S., Imielinski, T., Iyer, B., and Swami, A. (1992). An Interval Classifier for Database Mining Applications. Proceedings of the 18th International Conference on Very Large Databases. (pp. 560-573). Vancouver, BC.
Blake, C. L. & Merz, C. J. (2004). UCI Repository of machine learning databases [http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science.
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984). Classification and Regression Trees. Belmont, CA: Wadsworth International.
Chen, Y.-L., Hsu, C.-L., and Chou, S.-C. (2003). Constructing a multi-valued and multi-labeled decision tree, Expert Systems with Applications, 25(2), 199-209.
Date, C. J. (1999). An Introduction to Database Systems, 7th edition. Addison Wesley.
Gehrke, J., Ramakrishnan, R., and Ganti V. (1998). Rainforest: A framework for fast decision tree construction of large datasets. Proceedings of the 24th International Conference on Very Large Databases. New York.
Gordon, D. F. and Desjardins, Marie (1995). Evaluation and Selection of Biases in Machine Learning. Machine Learning, 20(1-2), 5-22.
Guiltinan, J. P. (1987). The Price Bundling of Services: A Normative Framework. Journal of Marketing, 51(2), 74-85.
Han, J., Nishio, S., Kawano, H., and Wang, W. (1998). Generalization-Based Data Mining in Object-Oriented Databases Using an Object-Cube Model. Data and Knowledge Engineering, 25(1-2), 55-97.
Han, J. (2000). From Data Mining To Web Mining: An Overview. Conference tutorial (in PowerPoint), 2000 International Database Systems Conference. Hong Kong, ftp://ftp.fas.sfu.ca/pub/cs/han/slides/hkw00.ppt.
Han, J. and Kamber, M. (2001). Data Mining: Concepts and Techniques. (pp. 279-333). San Francisco, CA: Morgan Kaufmann.
Hettich, S. and Bay, S. D. (2004). The UCI KDD Archive [http://kdd.ics.uci.edu]. Irvine, CA: University of California, Department of Information and Computer Science.
Kotler, P. (1999). Marketing Management: Analysis, Planning, Implementation, and Control. Prentice Hall.
Mantaras, R. L. D. (1991). A Distance-Based Attribute Selection Measure for Decision Tree Induction. Machine Learning, 6, 81-92.
Mehta, M., Agrawal, R., and Rissanen, J. (1996). SLIQ: A Fast Scalable Classifier for Data Mining.Proceedings of the Fifth International Conference on Extending Database Technology.
Quinlan, J. R. (1979). Discovering rules from large collections of examples: a case study. In Michie, D. (Ed.), Expert Systems in the Microelectronic Age. Edinburgh, Scotland: Edinburgh University Press.
Quinlan, J. R. (1986). Induction of decision trees. Machine Learning, 1(1), 81-106.new window
Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. San Mateo, CA: Morgan Kaufmann.
Rastogi, R. and Shim, K. (1998). Public: A decision tree classifier that integrates building and pruning. Proceedings of the 24th International Conference on Very Large Databases.
Shafer, J. C., Agrawal, R., and Mehta, M. (1996). SPRINT: a scalable parallel classifier for data mining. Proceedings of the 22nd International Conference on Very Large Databases. (pp. 544-555). Mumbai (Bombay), India.
Shannon, C. E. (1948). A Mathematical Theory of Communication. Bell System Technical Journal, 27, 379-423; 623-656.
Silver, E. A. and Peterson, R. (1985). Decision systems for inventory management and production planning, 2nd edition. New York: Wiley.
Steinberg, D. and Colla, P. L. (1995). CART: Tree-Structured Nonparametric Data Analysis. San Diego, CA: Salford Systems.
Umano, M., Okamoto, H., Hatono, I., Tamura, H., Kawachi, F., Umedzu, S., and Kinoshita, J. (1994). Fuzzy decision trees by fuzzy ID3 algorithm and its application to diagnosis systems. Proceedings of the third IEEE International Conference on Fuzzy Systems, 3. (pp. 2113-2118). Orlando, FL.
Wang, K., Zhou, S., and Liew, S. C. (1999). Building hierarchical classifiers using class proximity. Proceedings of the 25th International Conference on Very Large Data Bases. (pp. 363-374). Edinburgh, Scotland.
Wang, H., & Zaniolo, C. (2000). CMP: a fast decision tree classifier using multivariate predictions. Proceedings of the 16th International Conference on Data Engineering (pp. 449-460).
Zaiane, O. R. and Han, J. (1995). Resource and knowledge discovery in global information systems: A preliminary design and experiment. Proceedings of the First International Conference on Knowledge Discovery and Data Mining. (pp. 331-336). Montreal, Quebec.
第一頁 上一頁 下一頁 最後一頁 top
QR Code