

題名:Exploring Text-Mining Methods to Predict ICD-9-CM Codes Using Electronic Patient Records
作者:王惠嘉 引用關係黃天祥劉姿蘭
作者(外文):Wang, Hei-chiaHuang, Tian-hsiangLiu, Tzu-lan
主題關鍵詞:電子病歷記錄支援向量機詞頻-逆向文件頻率ICD-9-CMElectronic patient recordsSupport vector machineTF-IDF
原始連結:連回原系統網址new window
  • 被引用次數被引用次數:期刊(1) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:1
  • 共同引用共同引用:0
  • 點閱點閱:15
臺灣自1995年起開始實施全民健康保險制度,期望經由財源調節來避免人民可能難以支付其必要之醫療服務的窘境。在此制度之下,醫療單位必須根據第九版修訂之國際疾病分類碼(ICD-9-CM)向中央健康保險局申請民眾醫療費用的給付。因此醫療單位需聘請專員依據病人的出院病歷來進行編碼,而此人工編碼的過程是相當耗時且乏味的,所幸自動分類的方法可望有效率地幫助該編碼過程的順利完成。為了改善ICD-9-CM自動編碼過程,本研究探究了三種知名方法:貝氏演算法(Naïve Bayes)、支援向量機(support vector machine; SVM)和向量空間模型(vector space model; VSM),以及詞頻(term frequency; TF)和詞頻-逆向文件頻率(TF multiplied by the inverse document frequency; TF-IDF)等兩種特徵選取方法,使用臺灣南部某醫學中心等級醫院之六個醫療科別的電子出院病歷進行研究。本研究同時探究加入本體論(ontology)的同義詞替代對編碼準確度的影響。實驗結果顯示沒有使用特徵選取的支援向量機是表現最好的方法,而結合0.1門檻值的詞頻-逆向文件頻率特徵選取的向量空間模型則只適用於心臟血管科。儘管詞頻-逆向文件頻率的特徵選取比詞頻特徵選取改進了一些效率,加入本體論的同義詞替代並沒有非常有效地增進編碼預測效率。總而言之,支援向量機方法被推薦使用於ICD-9-CM的自動編碼過程。
In 1995, Taiwan's government initiated the National Health Insurance (NHI) program in order to marshal resources to resolve difficulties that people may encounter when paying for health care. Under this program, most medical organizations apply for medical treatment fees from Bureau of the NHI according to diagnosis-related group (DRG) codes based on the International Classification of Disease, 9th Version, Clinical Modification (ICD-9-CM). The application process requires specialists to distinguish ICD-9-CM codes using the discharge diagnoses of doctors. This process is inefficient, time-consuming and tedious, especially when performed manually. These problems can potentially be reduced, using automatic classification methods. To improve the efficiency of ICD-9-CM predictions, we explored three well-known methods: Naïve Bayes, support vector machine (SVM) and vector space model (VSM) with term frequency (TF) and TF multiplied by the inverse document frequency (TF-IDF), respectively weighted for feature selection in the discharge diagnoses used by six hospital departments. This paper also explores whether use of an ontology influences prediction accuracy. The experimental results show that the preferred method is SVM without feature weighting, although hospital departments show a mean macro-averaged F-measure score (F) of 0.7937, which varies from 0.7374 to 0.9009. Based on the selected hospital departments, VSM with TF-IDF with a threshold 0.1 was only appropriate for the cardiology department, while the models for the other departments were not modified. Regarding usage of an ontology, synonym replacement does not work very efficiently, although TF-IDF showed less improvement than TF. In summary, SVM is recommended to predict ICD-9-CM.
1.Losiewicz, P.、Oard, D. W.、Kostoff, R. N.(2000)。Textual data mining to support science and technology management。Journal of Intelligent Information Systems,15(2),99-119。  new window
2.Mullooly, J. P.、Donahue, J. G.、DeStefano, F.、Baggs, J.、Eriksen, E.(2008)。Predictive value of ICD-9-CM codes used in Vaccine Safety Research。Methods of Information in Medicine,47,328-335。  new window
3.Rahimi, B.、Vimarlund, V.(2007)。Methods to [U] evaluate health information systems in healthcare settings: A literature review。Journal of Medical Systems,31,397-432。  new window
4.Fiszman, M.、Chapman, W. W.、Evans, S. R.、Haug, P. J.(2000)。Automatic identification of pneumonia related concepts on chest x-ray reports。Journal of the American Medical Informatics Association,7(6),593-604。  new window
5.Ono, H.、Takabayashi, K.、Suzuki, T.、Yokoi, H.、Imiya, A.、Satomura, Y.(2004)。Extraction of diagnosis related terminological information from discharge summary。IEIC Technical Report,103(295),13-18。  new window
6.Meystre, S.、Haug, P. J.(2005)。Automation of a problem list using natural language processing。BMC Medical Informatics and Decision Making,5(30)。  new window
7.Meystre, S.、Haug, P. J.(2006)。Natural language processing to extract medical problems from electronic clinical documents: performance evaluation。Journal of Biomedical Informatics,39,589-599。  new window
8.Mao, W.、Chu, W. W.(2007)。The phrase-based vector space model for automatic retrieval of free-text medical documents。Data & Knowledge Engineering,61(1),76-92。  new window
9.Lovis, C.、Michel, P. A.、Baud, R.、Scherrer, J. R.(1995)。Use of a conceptual semi-automatic ICD-9 encoding system in an hospital environment。Lecture Notes in Computer Science,934,331-339。  new window
10.Weng, S.-S.、Lin, Y.-J.(2003)。A study on searching for similar documents based on multiple concepts and distribution of concepts。Expert Systems with Applications,25,355-368。  new window
11.Li, N.、Wu, D. D.(2010)。Using text mining and sentiment analysis for online forums hotspot detection and forecast。Decision Support Systems,48,354-368。  new window
12.Lee, D. L.、Chuang, H.、Seamons, K.(1997)。Document ranking and the vector-space model。IEEE Software,14,67-75。  new window
13.Stigler, S. M.(1982)。Thomas Bayes's Bayesian inference。Journal of the Royal Statistical Society, Series A,145,250-258。  new window
14.Hettne, K. M.、van Mulligen, E. M.、Schuemie, M. J.、Schijvenaars, B. J.、Kors, J. A.(2010)。Rewriting and suppressing UMLS terms for improved biomedical term identification。Journal of Biomedical Semantics,1(5)。  new window
15.Fung, K. W.、Hole, W. T.、Nelson, S. J.、Srinivasan, S.、Powell, T.、Roth, L.(2005)。Integrating SNOMED CT into the UMLS: An Exploration of Different Views of Synonymy and Quality of Editing。Journal of the American Medical Informatics Association,12,486-494。  new window
16.Mougina, F.、Bodenreider, O.、Burgun, A.(2009)。Analyzing polysemous concepts from a clinical perspective: Application to auditing concept categorization in the UMLS。Journal of Biomedical Informatics,42,440-451。  new window
17.Kang, Y. H.(2005)。Representative Term Based Feature Selection Method for SVM Based Document Classification。Lecture Notes in Computer Science,3681,56-61。  new window
18.Wang, H. C.、Huang, T. H.(2009)。Prediction of EST functional relationships via literature mining with user-specified parameters。IEEE Transactions on Biomedical Engineering,56,969-977。  new window
19.Fan, R.-E.、Chen, P.-H.、Lin, C.-J.(2005)。Working set selection using second order information for training support vector machines。Journal of Machine Learning Research,6,1889-1918。  new window
20.Nikulin, V.、Huang, T. H.、McLachlan, G.(2011)。Classification of High-dimensional Microarray Data with Two Steps Procedure Including Wilcoxon Criterion and Multilayer Perceptron。Intemational Journal of Computational Intelligence and Applications,10,1-14。  new window
21.Kim, S.-B.、Han, K.-S.、Rim, H.-C.、Myaeng, S. H.(2006)。Some effective techniques for Naive Bayes text classification。IEEE Transactions on Knowledge and Data Engineering,18(11),1457-1466。  new window
22.Salton, G.、Wong, A.、Yang, C. S.(1975)。A Vector Space Model for Automatic Indexing。Communications of the ACM,18(11),613-620。  new window
23.Pai, Ping-Feng、Lin, Chih-Sheng(2005)。A hybrid ARIMA and support vector machines model in stock price forecasting。Omega,33(6),497-505。  new window
1.Deogun, J. S.、Sever, H.、Raghavan, V. V.(1998)。Structural abstractions of hypertext documents for web-based retrieval。Ninth International Workshop on Database and Expert Systems Applications。  new window
2.Rao, R.、Sandilya, S.、Niculescu, R.、Germond, C.、Rao, H.(2003)。Clinical and financial outcomes analysis with existing hospital patient records。International Conference on Knowledge Discovery and Data Mining。  new window
3.Xu, J.、Yu, S.、Bi, J.、Lita, L. V.、Niculescu, R. S.、Rao, R. B.(2007)。Automatic medical coding of patient records via weighted ridge regression。The Sixth International Conference on Machine Learning and Applications。  new window
4.Dale, R.、Molla-Aliod, D.、Schwitter, R.(2003)。Natural language processing in the undergraduate curriculum。Fifth Australasian Computing Education Conference。  new window
5.Osuna, E.、Freund, R.、Girosi, F.(1997)。Training support vector machines: An application to face detection。Conference on IEEE Computer Vision and Pattern Recognition,130-136。  new window
6.Yu, G.-X.、Ostrouchov, G.、Geist, A.、Samatova, N. F.(2003)。An SVM-based algorithm for identification of photosynthesis-specific genome features。2nd IEEE computer society bioinformatics conference。CA。235-243。  new window
7.Bodenreider, O.、Mitchell, J. A.、McCray, A. T.(2002)。Evaluation of the UMLS as a terminology and knowledge resource for biomedical informatics。2002 AMIA Annual Symposium,61-65。  new window
8.Xu, X.、Zhang, X.、Hu, X.(2007)。Using two-stage concept-based singular value decomposition technique as a query expansion strategy。21st International Conference on Advanced Information Networking and Applications Workshops (AINAW'07)。  new window
9.Jing, L.-P.、Huang, H.-K.、Shi, H.-B.(2002)。Improved Feature Selection Approach TFIDF in Text Mining。First International Conference on Machine Learning and Cybernetics。Institute of Electrical and Electronics Engineers。944-946。  new window
10.Qu, S.、Wang, S.、Zou, Y.(2008)。Improvement of Text Feature Selection Method Based on TFIDF。2008 International Seminar on Future Information Technology and Management Engineering,79-81。  new window
11.Joachims, Thorsten(1998)。Text Categorization with Support Vector Machines: Learning with Many Relevant Features。ECML-98, 10th European Conference on Machine Learning。Springer。137-142。  new window
1.Organization, W. H.(1977)。Intemational Classification of Diseases, 1975 Revision, Section 1.2. History and Development of Uses of the ICD。Geneva, Switzerland:World Health Organization。  new window
2.Larkey, L. S.、Croft, W. B.(1995)。Automatic assignment of ICD9 codes to discharge smnmaries。University of Massachusetts Center for Intelligent Information Retrieval。  new window
3.Baclawski, K.、Niu, T.(2006)。Ontologies for bioinformatics。Cambridge:The MIT Press。  new window
4.Vapnik, Vladimir Naumovich(1995)。The Nature of Statistical Learning Theory。Springer-Verlag。  new window
5.Sullivan, Dan(2001)。Document Warehousing and Text Mining: Techniques for Improving Business Operations, Marketing, and Sales。John Wiley & Sons, Inc.。  new window
6.Cristianini, N.、Shawe-Taylor, John(2000)。An Introduction to Support Vector Machines and Other Kernel-based Learning Methods。Cambridge University Press。  new window
1.National Library of Medicine(2010)。UMLS,http://www.nlm.nih.gov/research/iimls/,2010。  new window
2.Williams, K.(2010)。Naive Bayes Algorithm For AI::Categorizer,http://search.cpan.org/~kwilliams/AI-Categorizer-0.09/lib/AI/Categorizer/Leamer/NaiveBayes.pm。  new window
1.Lita, L.、Yu, S.、Niculescu, S.、Bi, J.(2007)。Large scale diagnostic code classification for medical patient records。CAD and Knowledge Solutions。Siemens Medical Solutions USA, Inc.。  new window
第一頁 上一頁 下一頁 最後一頁 top
QR Code