結合分類分群技術建立推測法則之研究__臺灣人文及社會科學引文索引資料庫

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

論文基本資料
摘要
外文摘要
參考文獻

題名：	結合分類分群技術建立推測法則之研究
作者：	許武先
作者(外文)：	Wu-hsien Hsu
校院名稱：	國立中央大學
系所名稱：	資訊管理研究所
指導教授：	陳彥良
學位類別：	博士
出版日期：	2011
主題關鍵詞：	資料探勘；分類；分群；推測規則；決策樹；數值分析；模糊理論；Data Mining；Cluster Analysis；Conceptual Cluste
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0) 排除自我引用:0 共同引用:0 點閱:26

資料探勘的主要目的是發掘隱藏或未知的知識。分類技術可以透過分析具有分類標簽的訓練資料，建立各項法則以便未來對新資料進行分類。然而若資料集並未存在已知的分類標籤，分類技術則無法發揮。而分群技術可將無標籤的資料依據各資料點的相似程度，分成若干群，各群因具有高度相似的屬性值，可將各群歸類為某種概念。雖然分群技術可將無標籤的資料分為特定的數個概念，分群技術的特性卻無法如同分類技術一樣，將分群的規則留下來，以便於未來推測之用。
所謂「推測」係針對不熟悉或無法提供分類標簽之資料集進行兩組不同屬性之分析，期能發掘出兩組資料屬性之關係，進而建立推測的法則。
本研究延伸了先前的研究，提出新的方法，藉以發掘隱性法則與改善推測正確率。除了運用分類技術建立決策樹，作為推測法則，同時以分群方式來解決無標籤資料的困境。也透過模糊理論的實踐與離群值處理，對於隱性法則的發掘，以及正確率的提升都有顯著的結果。實驗結果顯示本研究所提出的方法，能有效建立推測法則，所發掘的規則也可彌補過去方法的缺憾。

以文找文

Discovering hidden or unknown knowledge is the major theme of most data mining studies. In this dissertation, we propose a new approach to discover conjecturable rules, which categorize observations of a data set into classes of similar attribute values instead of classes of crisp labels. The proposed approach is developed based on the two most developed data mining techniques: Classification and Clustering.
Classification is the problem of identifying the sub-population to which new observations belong. The result is decided according to a set of rules which discovered from a training set of data of observations whose sub-population is known. The technique is known as supervised learning, i.e. pre-defined labels are necessary for the process. The result is a set of rules which are able to predict which label a new observation is belonged to. However, when there is no label existed in the dataset, this technique fails to apply. On the other hand, Clustering is the process of grouping a set of objects into classes of similar objects. No pre-defined label is necessary for the process. It is known as unsupervised learning. Yet no any rule is preserved after the process for future prediction.
The object of this dissertation is to discover conjecturable rules from those datasets which do not have any predefined class label. Furthermore, the technique extends our two previous studies with fuzzy concept and outliers handling. Thus recessive conjecturable rules can be discovered as well as the accuracy is improved. The proposed technique covers the convenience of unsupervised learning as well as the ability of prediction of decision trees. The experiment results show that our proposed approach is capable to discover conjecturable rules as well as recessive rules. Sensitivity analysis is also given for practitioners’ reference.

以文找文

Agrawal, R. and Srikant, R. (1994). Fast Algorithms for Mining Association Rules in Large Databases. Proceedings of the 20th International Conference on Very Large Data Bases. 487-499.
Agrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P. (1998). Automatic subspace clustering of high dimensional data for data mining applications. In: Proceedings of the ACM SIGMOD Int''l Conference on Management of Data, 94-105.
Ankerst, M., Breunig, M., Kriegel, H.-P., and Sander, J. (1999). OPTICS: Ordering Points to Identify the Clustering Structure. Proceedings of ACM SIGMOD International Conference on Management of Data. 322-331.
Basak, J. and Krishnapuram, R. (2005). Interpretable Hierarchical Clustering by Constructing an Unsupervised Decision Tree. IEEE Transactions on Knowledge and Data Engineering, 17(1), 121- 132.
Berkhin, P., (2002). Survey of clustering data mining techniques. Technical Report, CA: Accrue Software.
Berson, A., Smith, S., and Thearling, K. (2000). Building data mining applications for CRM. McGraw-Hill New York.
Bezdek, J., (1981). Pattern recognition with fuzzy objective function algorithms. Plenum Press, New York.
Bezdek, J.C., Ehrlich, R., and Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Computers & Geosciences Vol. 10, Issue 2-3, 191-203.
Breiman, L., Friedman, J.H., Olshen, R.A., and Stone, C.J. (1984). Classification and Regression Trees. London: Chapman and Hall.
Chan, P.K., Fan, W., Prodromidis, A.L. and Stolfo, S.J. (1999). Distributed data mining in credit card fraud detection. Intelligent Systems and Their Applications, IEEE (IEEE Intelligent Systems). 14(6). 67-74.
Chen, M.S., Han, J., and Yu, P. S. (1996). Data mining: an overview from a database perspective. IEEE Transactions on Knowledge and Data Engineering. 8(6). 866-883.
Chen, N., Chen, A. and Zhou, L. Lu. (2001). A graph-based clustering algorithm in large transaction databases. Intelligent Data Analysis. 5(4). 327-338.
Chen, Y.L., Hsu, C.L., and Chou, S.C. (2003). Constructing a multi-valued and multi-labeled decision tree. Expert Systems with Applications, 25 (2), 199-209.
Chen, Y.L., Hsu, W.H., Lee, Y.H. (2006). TASC: two-attribute-set clustering through decision tree construction, European Journal of Operational Research 174, 930-944
Chen, Y.L., and Hu H.L., (2006). An overlapping cluster algorithm to provide non-exhaustive clustering. European Journal of Operational Research, vol. 173, 762-780.
Cheng, C.H., Fu, A.W., and Zhang, Y., (1999). Entropy-based subspace clustering for mining numerical data. Proceedings of the fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 84-93.
Dunn, J, (1973). A fuzzy relative of the Isodata process and its use in detecting compact, well-separated clusters. Journal of Cybernetics, vol. 3(3), 32-57
Ester, M., Kriegel, H.P., Sander, J. and Xu, X. (1996). A density-based algorithm for discovering clusters in large spatial databases with noise. Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining. 226-231
Fayyad, U., Piatetsky-Shapiro, G., and Smyth, P. (1996). From data mining to knowledge discovery in databases. AI Magazine, 17(3), 37-54.
Fisher, D.H. (1987). Knowledge Acquisition via Incremental Conceptual Clustering, Machine Learning. 2, 139-172.
Friedman, J.H., and Rafsky, L.C. (1979). Multivariate generalizations of the Wald–Wolfowitz and Smirnov two-sample tests. The Annals of Statistics, 17, 697–717.
Friedman, J.H., and Rafsky, L.C. (1981). Graphics for the multivariate two-sample problem. Journal of American Statistics Association, 76, 277–293.
Friedman, J.H., and Rafsky, L.C. (1983). Graph-theoretic measures of multivariate association and prediction. The Annals of Statistics, 11(2), 377–391.
Friedman, J.H. and Fisher, N.I. (1999). Bump Hunting in High-dimensional Data, Statistics and Computing, Vol. 9, Issue 2, 123-143.
Gehrke, J., Ganti, V., Ramakrishnan, R., and Loh, W.-Y. (1999). BOAT – optimistic decision tree construction. Proceedings of ACM SIGMOD International Conference on Management of Data. 169-180.
Giannotti, F., Gozzi, C., and Manco, G.., (2001). Clustering Transactional Data. Proceedings of SEBD-01 National Conference on Advanced Database Systems. 163-176.
Giudici, P. (2003) Applied data mining: statistical methods for business and industry. Wiley.
Gonzalez-Barrios, J.M., and Quiroz, A.J., (2003). A clustering procedure based on the comparison between the k nearest neighbors graph and the minimal spanning tree. Statistics & Probability Letters, 62, 23-24.
Grabmeier, J., and Rudolph, A. (2002). Techniques of cluster algorithms in data mining. Data Mining and Knowledge Discovery, 6(4), 303-360.
Guha, S., and Rastogi, R., (2000) ROCK: A Clustering Algorithm for Categorical Attributes. Information System Journal, 25 (5), 345-366.
Guha, S., Rastogi, R., and Shim, K. (1998). CURE: An efficient clustering algorithm for large databases. In: Proceedings of the ACM SIGMOD Conference, 73-84.
Guha, S., Rastogi, R., and Shim, K., (2001). CURE: an efficient clustering algorithm for large databases. Information Systems, 26(1), 35-58.
Guo, L., Zhang, M., Sun, L., and Wang, Z., (2006). Fuzzy clustering model of CRM in securities trade. Proceedings of the 6th World Congress on Intelligent Control and Automation (WCICA). 6052-6054.
Halkidi, M., Batistakis, Y., and Vazirgiannis, M., (2001). Clustering algorithms and validity measures. Proceedings of the Thirteenth International Conference on Scientific and Statistical Database Management. 3 -22.
Han, J., and Kamber, M., (2006). Data Mining: Concepts and Techniques., 2nd edition, Morgan Kaufmann.
Hsu, W.H., Jao, J.A. and Chen, Y.L. (2005). Discovering conjecturable rules through tree-based clustering analysis, Experts Systems with Applications 29, 493-505.
Jain, A.K., Murty, M.N., and Flynn, P.J., (1999). Data clustering: a review. ACM Computing Surveys, 31(3): 264-323.
Kantardzic, M., (2003). Data Mining: Concepts, Models, Methods, and Algorithms. NJ: John Wiley & Sons.
Karypis, G., Han, E.H., and Kumar, V., (1999). Chameleon: Hierarchical Clustering Using Dynamic Modeling. IEEE Computer, (32) 68-74.
Kaufman, L., and Rousseeuw, P.J. (1990). Finding Groups in Data: an Introduction to Cluster Analysis. NJ: John Wiley & Sons.
Keim, D., and Hinneburg, A. (1999). Clustering techniques for large data sets: from the past to the future. KDD Tutorial Notes 1999: 141-181.
Klawonn, F., and Kruse, R. (1997). Constructing a fuzzy controller from data. Fuzzy Sets and Systems 85. 177-193.
Lenard, M. J., Alam, P., and Booth, D., (2000). An analysis of fuzzy clustering and a hybrid model for the auditor’s going concern assessment. Decision Sciences, vol. 31(4), 861-884.
Liu, B., Xia, Y., and Yu, P., (2000). Clustering through decision tree construction. Proceedings of Ninth International Conference on Information and Knowledge Management. 290-297.
MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the fifth Berkeley symposium on mathematical statistics and probability. 281-297.
Mattison, R. (1997). Data warehousing and data mining for telecommunications. Artech House, Inc.
Mehta, M., Rissanen, J., and Agrawal, R. (1995). MDL-based decision tree pruning. Proceedings of the First International Conference on Knowledge Discovery and Data Mining. 216-221.
Ng, R., and Han, J. (2002). CLARANS: A Method for Clustering Objects for Spatial Data Mining. IEEE Transactions on Knowledge and Data Engineering. 14(5). 1003-1016.
Ozer, M., (2001). User segmentation of online music services using fuzzy clustering. Omega: the International Journal of Management Science, vol. 29, 193-206.
Ozer, M., (2005). Fuzzy c-means clustering and Internet portals: a case study. European Journal of Operational Research, vol. 164, 696-714.
Quinlan, J.R., (1986). Induction of decision trees. Machine Learning. 1, 81-106.
Quinlan, J.R. (1987). Simplifying decision trees. International Journal of Man-Machine Studies. 27(3). 221-234.
Quinlan, J.R., (1993). C4.5: Programs for Machine Learning. CA: Morgan Kaufmann.
Quinlan, J.R. (1996). Improved use of continuous attributes in C4.5. Journal of Artificial Intelligence Research, 4, 77-90.
Ralambondrainy, H., (1995). A Conceptual Version of the k-means Algorithm, Pattern Recognition Letters, 16, pp.1147-1157.
Rastogi, R. and Shim, K. (1998). PUBLIC: A decision tree classifier that integrates building and pruning. Proc. VLDB-98, pp. 404-415.
Ruggieri, S. (2002). Efficient C4.5. IEEE Transactions on Knowledge and Data Engineering, 14 (2), 438-444.
Salton, G., (1989). Automatic text processing: the transformation, analysis and retrieval of information by computer, PA: Addison Wesley.
Shafer, J., Agrawal, R., and Mehta, M. (1996). SPRINT: A scalable parallel classifier for data mining. Proceedings of 22nd International Conference on Very Large Data Bases. 544-555.
Shoji, H., Sun, X., and Shusaku, T. (2004). Comparison of clustering methods for clinical databases, Information Sciences, Vol.159, Issue: 3-4, 155-165.
Spangler, W.E., May, J.H., and Vargas, L.G., (1999). Choosing data-mining methods for multiple classification: representational and performance measurement implications for decision support. Journal of Management Information Systems, vol. 16(1), 37-62.
Sullivan, R., Timmermann, A., and White, H. (1998). The dangers of data-driven inference: the case of calendar effects in stock returns. LSE Financial Markets Group.
Theodoridis, S. & Koutroumbas, K. (2006). Pattern Recognition 3rd Ed., 635.
Wang, W., Yang, J., and Muntz, R. (1997). STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of 23rd International Conference on Very Large Data Bases. 186-195.
Wu, K.L. and Yang, M.S. (2002). Alternative c-means clustering algorithms, Pattern Recognition 35, 2267–2278.
Yao, Y.Y., (1998). A comparative study of fuzzy sets and rough sets. Journal of Information Sciences 109, 227-242.
Ye, N. and Li, X. (2002). A scalable, incremental learning algorithm for classification problems, Computers & Industrial Engineering Journal, 43(4): 677-692.
Zhang, T., Ramakrishnan, R., and Livny, M. (1997), BIRCH: A New Data Clustering Algorithm and Its Applications. Data Mining and Knowledge Discovery, 1, 141–182.

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	多元交通行動服務使用者之套票購買行為分析--以高雄市MaaS系統為例
2.	The Crucial Factors of Clicking Keyword Advertisements
3.	應用自動文字探勘於臺灣中文饒舌音樂歌詞之研究
4.	Churn Prediction Based on the Analysis of Customers' Preferences and Social Behavior on a Big Data Platform
5.	應用資料探勘技術建構顧客流失預測模型
6.	資料探勘演算法於軍人貪污量刑之預測及比較
7.	止吐藥物與患者自控式止痛副作用相關分析
8.	酌定子女親權之重要因素：以決策樹方法分析相關裁判
9.	來臺日籍旅客之動態行為分析結合大數據應用之研究
10.	以資料探勘分析推甄入學之學生就讀機率--以某大學資管系為例
11.	網路算命使用者行為與特徵分析：資料探勘技術之應用
12.	資料探勘於評估股權投資項目應用之研究
13.	應用決策樹探討研究所補教業者之電話行銷策略
14.	以兩階段集群分析方法之比較：以泰國普吉島遊客資訊管理為例
15.	不同的資料採礦方法於教師教學評量之比較研究

1.	使用文本探勘在伺服器開發上建立無效的缺陷分類模型
2.	高血壓藥物對台灣高血壓年長婦女尿失禁及其醫療費用之影響
3.	具非重現性擁擠特性之高速公路旅行時間預測
4.	結合單調性先備知識於支援向量機之研究
5.	資料探勘技術於台灣製藥產業客戶價值分析-行銷策略與銷售人力績效特質之探討
6.	以無線射頻辨識為基礎的混合與啟發式資料探勘技術應用於品質管理
7.	以資料探勘分析影響國民中小學學習成就因素之研究
8.	資源有限下的決策樹建構
9.	擴充屬性資訊以提升小樣本分類之效果
10.	不同標籤屬性變化下的決策樹建構系統
11.	擴充屬性資訊以提升小樣本分類之效果
12.	資料探勘應用於台灣航空忠誠旅客管理之研究
13.	使用資料探勘技術挖掘線上論壇討論活動型態
14.	資料探勘手術後減重效果分類模式之建構
15.	運用資料探勘技術建構半導體封裝業之品質改善系統

無相關書籍

無相關著作

無相關點閱

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫