:::

詳目顯示

回上一頁
題名:結合分類分群技術建立推測法則之研究
作者:許武先
作者(外文):Wu-hsien Hsu
校院名稱:國立中央大學
系所名稱:資訊管理研究所
指導教授:陳彥良
學位類別:博士
出版日期:2011
主題關鍵詞:資料探勘分類分群推測規則決策樹數值分析模糊理論Data MiningCluster AnalysisConceptual Cluste
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:26
資料探勘的主要目的是發掘隱藏或未知的知識。分類技術可以透過分析具有分類標簽的訓練資料,建立各項法則以便未來對新資料進行分類。然而若資料集並未存在已知的分類標籤,分類技術則無法發揮。而分群技術可將無標籤的資料依據各資料點的相似程度,分成若干群,各群因具有高度相似的屬性值,可將各群歸類為某種概念。雖然分群技術可將無標籤的資料分為特定的數個概念,分群技術的特性卻無法如同分類技術一樣,將分群的規則留下來,以便於未來推測之用。
所謂「推測」係針對不熟悉或無法提供分類標簽之資料集進行兩組不同屬性之分析,期能發掘出兩組資料屬性之關係,進而建立推測的法則。
本研究延伸了先前的研究,提出新的方法,藉以發掘隱性法則與改善推測正確率。除了運用分類技術建立決策樹,作為推測法則,同時以分群方式來解決無標籤資料的困境。也透過模糊理論的實踐與離群值處理,對於隱性法則的發掘,以及正確率的提升都有顯著的結果。實驗結果顯示本研究所提出的方法,能有效建立推測法則,所發掘的規則也可彌補過去方法的缺憾。
Discovering hidden or unknown knowledge is the major theme of most data mining studies. In this dissertation, we propose a new approach to discover conjecturable rules, which categorize observations of a data set into classes of similar attribute values instead of classes of crisp labels. The proposed approach is developed based on the two most developed data mining techniques: Classification and Clustering.
Classification is the problem of identifying the sub-population to which new observations belong. The result is decided according to a set of rules which discovered from a training set of data of observations whose sub-population is known. The technique is known as supervised learning, i.e. pre-defined labels are necessary for the process. The result is a set of rules which are able to predict which label a new observation is belonged to. However, when there is no label existed in the dataset, this technique fails to apply. On the other hand, Clustering is the process of grouping a set of objects into classes of similar objects. No pre-defined label is necessary for the process. It is known as unsupervised learning. Yet no any rule is preserved after the process for future prediction.
The object of this dissertation is to discover conjecturable rules from those datasets which do not have any predefined class label. Furthermore, the technique extends our two previous studies with fuzzy concept and outliers handling. Thus recessive conjecturable rules can be discovered as well as the accuracy is improved. The proposed technique covers the convenience of unsupervised learning as well as the ability of prediction of decision trees. The experiment results show that our proposed approach is capable to discover conjecturable rules as well as recessive rules. Sensitivity analysis is also given for practitioners’ reference.
Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases. 487-499.
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD Int''l Conference on Management of Data, 94-105.
Ankerst, M., Breunig, M., Kriegel, H.-P., and Sander, J. (1999). OPTICS: Ordering Points to Identify the Clustering Structure. Proceedings of ACM SIGMOD International Conference on Management of Data. 322-331.
Basak, J. and Krishnapuram, R. (2005). Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree. IEEE Transactions on Knowledge and Data Engineering, 17(1), 121- 132.
Berkhin, P., (2002). Survey of clustering data mining techniques. Technical Report, CA: Accrue Software.
Berson, A., Smith, S., and Thearling, K. (2000). Building data mining applications for CRM. McGraw-Hill New York.
Bezdek, J., (1981). Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York.
Bezdek, J.C., Ehrlich, R., and Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences Vol. 10, Issue 2-3, 191-203.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees. London: Chapman and Hall.
Chan, P.K., Fan, W., Prodromidis, A.L. and Stolfo, S.J. (1999). Distributed data mining in credit card fraud detection. Intelligent Systems and Their Applications, IEEE (IEEE Intelligent Systems). 14(6). 67-74.
Chen, M.S., Han, J., and Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering. 8(6). 866-883.
Chen, N., Chen, A. and Zhou, L. Lu. (2001). A graph-based clustering algorithm in large transaction databases. Intelligent Data Analysis. 5(4). 327-338.
Chen, Y.L., Hsu, C.L., and Chou, S.C. (2003). Constructing a multi-valued and multi-labeled decision tree. Expert Systems with Applications, 25 (2), 199-209.
Chen, Y.L., Hsu, W.H., Lee, Y.H. (2006). TASC: two-attribute-set clustering through decision tree construction, European Journal of Operational Research 174, 930-944
Chen, Y.L., and Hu H.L., (2006). An overlapping cluster algorithm to provide non-exhaustive clustering. European Journal of Operational Research, vol. 173, 762-780.
Cheng, C.H., Fu, A.W., and Zhang, Y., (1999). Entropy-based subspace clustering for mining numerical data. Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 84-93.
Dunn, J, (1973). A fuzzy relative of the Isodata process and its use in detecting compact, well-separated clusters. Journal of Cybernetics, vol. 3(3), 32-57
Ester, M., Kriegel, H.P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. 226-231
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54.
Fisher, D.H. (1987). Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning. 2, 139-172.
Friedman, J.H., and Rafsky, L.C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. The Annals of Statistics, 17, 697–717.
Friedman, J.H., and Rafsky, L.C. (1981). Graphics for the multivariate two-sample problem. Journal of American Statistics Association, 76, 277–293.
Friedman, J.H., and Rafsky, L.C. (1983). Graph-theoretic measures of multivariate association and prediction. The Annals of Statistics, 11(2), 377–391.
Friedman, J.H. and Fisher, N.I. (1999). Bump Hunting in High-dimensional Data, Statistics and Computing, Vol. 9, Issue 2, 123-143.
Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. (1999). BOAT – optimistic decision tree construction. Proceedings of ACM SIGMOD International Conference on Management of Data. 169-180.
Giannotti, F., Gozzi, C., and Manco, G.., (2001). Clustering Transactional Data. Proceedings of SEBD-01 National Conference on Advanced Database Systems. 163-176.
Giudici, P. (2003) Applied data mining: statistical methods for business and industry. Wiley.
Gonzalez-Barrios, J.M., and Quiroz, A.J., (2003). A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree. Statistics & Probability Letters, 62, 23-24.
Grabmeier, J., and Rudolph, A. (2002). Techniques of cluster algorithms in data mining. Data Mining and Knowledge Discovery, 6(4), 303-360.
Guha, S., and Rastogi, R., (2000) ROCK: A Clustering Algorithm for Categorical Attributes. Information System Journal, 25 (5), 345-366.
Guha, S., Rastogi, R., and Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD Conference, 73-84.
Guha, S., Rastogi, R., and Shim, K., (2001). CURE: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35-58.
Guo, L., Zhang, M., Sun, L., and Wang, Z., (2006). Fuzzy clustering model of CRM in securities trade. Proceedings of the 6th World Congress on Intelligent Control and Automation (WCICA). 6052-6054.
Halkidi, M., Batistakis, Y., and Vazirgiannis, M., (2001). Clustering algorithms and validity measures. Proceedings of the Thirteenth International Conference on Scientific and Statistical Database Management. 3 -22.
Han, J., and Kamber, M., (2006). Data Mining: Concepts and Techniques., 2nd edition, Morgan Kaufmann.
Hsu, W.H., Jao, J.A. and Chen, Y.L. (2005). Discovering conjecturable rules through tree-based clustering analysis, Experts Systems with Applications 29, 493-505.
Jain, A.K., Murty, M.N., and Flynn, P.J., (1999). Data clustering: a review. ACM Computing Surveys, 31(3): 264-323.
Kantardzic, M., (2003). Data Mining: Concepts, Models, Methods, and Algorithms. NJ: John Wiley & Sons.
Karypis, G., Han, E.H., and Kumar, V., (1999). Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer, (32) 68-74.
Kaufman, L., and Rousseeuw, P.J. (1990). Finding Groups in Data: an Introduction to Cluster Analysis. NJ: John Wiley & Sons.
Keim, D., and Hinneburg, A. (1999). Clustering techniques for large data sets: from the past to the future. KDD Tutorial Notes 1999: 141-181.
Klawonn, F., and Kruse, R. (1997). Constructing a fuzzy controller from data. Fuzzy Sets and Systems 85. 177-193.
Lenard, M. J., Alam, P., and Booth, D., (2000). An analysis of fuzzy clustering and a hybrid model for the auditor’s going concern assessment. Decision Sciences, vol. 31(4), 861-884.
Liu, B., Xia, Y., and Yu, P., (2000). Clustering through decision tree construction. Proceedings of Ninth International Conference on Information and Knowledge Management. 290-297.
MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 281-297.
Mattison, R. (1997). Data warehousing and data mining for telecommunications. Artech House, Inc.
Mehta, M., Rissanen, J., and Agrawal, R. (1995). MDL-based decision tree pruning. Proceedings of the First International Conference on Knowledge Discovery and Data Mining. 216-221.
Ng, R., and Han, J. (2002). CLARANS: A Method for Clustering Objects for Spatial Data Mining. IEEE Transactions on Knowledge and Data Engineering. 14(5). 1003-1016.
Ozer, M., (2001). User segmentation of online music services using fuzzy clustering. Omega: the International Journal of Management Science, vol. 29, 193-206.
Ozer, M., (2005). Fuzzy c-means clustering and Internet portals: a case study. European Journal of Operational Research, vol. 164, 696-714.
Quinlan, J.R., (1986). Induction of decision trees. Machine Learning. 1, 81-106.
Quinlan, J.R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies. 27(3). 221-234.
Quinlan, J.R., (1993). C4.5: Programs for Machine Learning. CA: Morgan Kaufmann.
Quinlan, J.R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77-90.
Ralambondrainy, H., (1995). A Conceptual Version of the k-means Algorithm, Pattern Recognition Letters, 16, pp.1147-1157.
Rastogi, R. and Shim, K. (1998). PUBLIC: A decision tree classifier that integrates building and pruning. Proc. VLDB-98, pp. 404-415.
Ruggieri, S. (2002). Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering, 14 (2), 438-444.
Salton, G., (1989). Automatic text processing: the transformation, analysis and retrieval of information by computer, PA: Addison Wesley.
Shafer, J., Agrawal, R., and Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. Proceedings of 22nd International Conference on Very Large Data Bases. 544-555.
Shoji, H., Sun, X., and Shusaku, T. (2004). Comparison of clustering methods for clinical databases, Information Sciences, Vol.159, Issue: 3-4, 155-165.
Spangler, W.E., May, J.H., and Vargas, L.G., (1999). Choosing data-mining methods for multiple classification: representational and performance measurement implications for decision support. Journal of Management Information Systems, vol. 16(1), 37-62.
Sullivan, R., Timmermann, A., and White, H. (1998). The dangers of data-driven inference: the case of calendar effects in stock returns. LSE Financial Markets Group.
Theodoridis, S. & Koutroumbas, K. (2006). Pattern Recognition 3rd Ed., 635.
Wang, W., Yang, J., and Muntz, R. (1997). STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of 23rd International Conference on Very Large Data Bases. 186-195.
Wu, K.L. and Yang, M.S. (2002). Alternative c-means clustering algorithms, Pattern Recognition 35, 2267–2278.
Yao, Y.Y., (1998). A comparative study of fuzzy sets and rough sets. Journal of Information Sciences 109, 227-242.
Ye, N. and Li, X. (2002). A scalable, incremental learning algorithm for classification problems, Computers & Industrial Engineering Journal, 43(4): 677-692.
Zhang, T., Ramakrishnan, R., and Livny, M. (1997), BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1, 141–182.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關書籍
 
無相關著作
 
無相關點閱
 
QR Code
QRCODE