:::

詳目顯示

回上一頁
題名:透過廣義動差法估計時間相依協變量的修正羅吉斯-韋伯模型,和修正混合多類別模型
作者:潘宥亦
作者(外文):Pan, Yu-Yi
校院名稱:國立成功大學
系所名稱:統計學系
指導教授:馬瀰嘉
學位類別:博士
出版日期:2023
主題關鍵詞:時間相依協變量廣義動差估計法多類別狀態支援向量機隨機森林機率校正Time-dependent covariatesGeneralized method of momentsMultiple statesSupport vector machineRandom forestProbability calibration
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:0
在分析患有慢性疾病或是一些癌症的案例中,有一定比例的患者在實驗期間可能永遠不會經歷研究者感興趣的事件。這種現象會導致生存曲線在觀察期結束時保持平穩的狀態。為了解決這個問題,使用混合治癒模型來分析具有重尾特性的存活時間資料。一般常見的混合治癒模型包含針對患者的治癒狀態和針對未治癒患者的生存函數部分。在本篇研究中,使用修正混合治癒模型,其中治癒患者的生存函數不固定為一,而是賦予一個特定的函數。此修改並不會對於參數估計有影響;取而代之的是,它會調整個別患者的生存機率,使其達到零。接著,本篇論文探討了兩個在使用修正混合治療模型會出現的問題。第一個問題是在考慮廣義估計方程式處理存活時間受到時間依賴協變量影響時,如果使用了不正確的工作相關矩陣,則參數估計量可能會有不一致性。在這項研究中,我提出了一種新穎的演算法,該演算法使用了三種修正受限已知基底矩陣的廣義矩方法來估計具有時間相依協變量的修正混合治癒模型的參數。基於二次推論函數概念,利用修正受限已知基底矩陣重建工作相關矩陣的逆矩陣,來確保參數估計量的一致性。而為了估計這些估計量的變異數,本篇論文推導了路易斯方法於三明治方差估計量上。模擬部分比較了兩種使用不同修正受限已知基底矩陣的演算法。第一種演算法是使用平均全一修正的受限已知基底矩陣和原始全一受限已知基底矩陣來處理型一時間相依協便量。而第二種演算法是使用平均下三角修正的受限已知基底矩陣和原始下三角的受限已知基底矩陣來處理型二時間相依協便量。而在實際資料分析,使用原發性膽汁性肝硬化(PBC)資料集說明了所提出方法的實際效用。第二個問題是,當潛在變數具有多種狀態時,需要對修正混合治癒模型的治癒狀態部分進行修改。解決多類別分類問題的一種方法是使用多項羅吉斯回歸模型(MLRM),它是羅吉斯回歸模型的擴展。此外,支援向量機(SVM) 和隨機森林(RF) 等方法也適用於處理兩類別和多類別反應變數。在本研究中,提出了修正混合多類別模型,其中治癒患者的生存分佈來自於韋伯分布。而為了適應多類別狀態的判定,本篇論文提出了三種模型:修正多項羅吉斯-韋伯模型、修正支援向量機-韋伯模型和修正隨機森林-韋伯模型。提出貝氏平行模型(BPM) 校準估計類別機率時,增強RF的投票結果。模擬部分,為了評估三種修正混合多類別模型在多項式線性Logit函數、多項式二次Logit函數和非線性形式下的類別機率預測效能。實例部分,透過使用結直腸癌(CRC) 資料集證明了所提出的修正混合多類別模型的實際用途。
A certain proportion of patients may never experience an event of interest during the duration of an experiment in cases of chronic diseases or cancers, and it results in the survival curve exhibiting a plateau at the end of the observed period. The use of a mixture cure model is appropriate for fitting survival time data with a heavy tail. The common mixture cure model comprises the incidence part and the latency part. A modified mixture cure model is utilized in which the survival function of a cured patient is not fixed at 1 but is given a specific function. This modification has no bearing on parameter estimation; instead, it adjusts a patient's survival probability, allowing it to reach 0. This dissertation explores two issues. One concern with using the modified mixture cure model arises when survival times are influenced by time-dependent covariates. In such cases, parameter estimators may become inconsistent when using the generalized estimating equations method with an incorrect working correlation matrix. I introduce a novel algorithm that uses three modified restricted known basis matrices to estimate the parameters of the modified mixture cure model with time-dependent covariates via the generalized method of moments. The inverse of working correlation matrix is reconstructed by using modified restricted known basis matrices to ensure the consistency of parameter estimators based on the concept of the quadratic inference function. For estimating the variances of these estimators, I derive sandwich variance estimators with Louis' method. The simulation involves comparing algorithms with the original and modified restricted known basis matrices for Type I and Type II time-dependent covariates. The practical utility of the proposed methodology is illustrated with a primary biliary cirrhosis (PBC) data set. Another issue is the modifications for the incidence part of the modified mixture cure model when the latent variable has multiple states. One way to address multiclass problems is by employing the multinomial logistic regression model (MLRM). Additionally, methods like support vector machine (SVM) and random forest (RF) can be suitable for handling both binary and multiclass response variables. The modified mixture multiclass models are proposed where the survival distribution of cured patients follows a Weibull distribution. To accommodate the determination of multiple states, I introduce the modified multinomial logistic-Weibull model, the modified SVM-Weibull model, and the modified RF-Weibull model. Furthermore, I propose Bayesian Parallel Model calibration to enhance the smoothness of voting results for RF when estimating class probabilities. The simulations involving comparing the predictive performance of class probabilities for the three modified mixture multiclass models under setting 1: multinomial linear logit, setting 2: multinomial quadratic logit, and setting 3: nonlinear form. The practical use of the proposed modified mixture multiclass models is demonstrated by using a colorectal cancer (CRC) data set.
Amico, M., Keilegom, I.V., Cure models in survival analysis. Annual Review of Statistics and Its Application, 2018, 5, 311-342.
Berkson, J., Gage, R.P., Survival curve for cancer patients following treatment. Journal of the American Statistical Association, 1952, 47.259, 501-515.
Boag, J.W., Maximum likelihood estimates of the proportion of patients cured by cancer therapy. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 1949, 11.1, 15-53.
Breiman, L., Random forests. Machine Learning, 2001, 45.1, 5-32.
Bui, V.M.H., Mettling, C., Jou, J., Sun, H.S., Genomic amplification of chromosome 20q13.33 is the early biomarker for the development of sporadic colorectal carcinoma. BMC Medical Genomics 2020, 2020, 13(Suppl 10), 149.
Chen, T., Du, P., Promotion time cure rate model with nonparametric form of covariate effects. Statistics in Medicine, 2018, 37.10, 1625-1635.
Chen, C.H., Tsay, Y.C., Wu, Y.C., Horng, C.F., Logistic-AFT location-scale mixture regression models with nonsusceptibility for left-truncated and general interval-censored data. Statistics in Medicine, 2013, 32.24, 4285-4305.
Chen, I.C., Westgate, P.M., Improved methods for the marginal analysis of longitudinal data in the presence of time-dependent covariates. Statistics in Medicine, 2017, 36.16, 2533-2546.
Chen, I.C., Westgate, P.M., A novel approach to selecting classification types for time-dependent covariates in the marginal analysis of longitudinal data. Statistical Methods in Medical Research, 2019, 28, 3176-3186.
Ching, T., Zhu, X., Garmire, L.X., Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data. PLOS Computational Biology, 2018, 14.4, e1006076. https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006076
Corbiere, F., Joly, P., A SAS macro for parametric and semiparametric mixture cure models. Computer Methods and Programs in Biomedicine, 2007, 85.2, 173-180.
Cortes, C, Vapnik, V., Support-vector networks. Machine Learning, 1995, 20.3, 273-297.
Cox, D.R., Regression models and life-tables. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 1972, 34.2, 187-220.
D'Andrea, A., Rocha, R., Tomazella, V., Louzada, F., Negative binomial Kumaraswamy-G cure rate regression model. Journal of Risk and Financial Management, 2018, 11.1, 6.
Dempster, A.P., Laird, N.M., Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society. Series B (Statistical Methodology), 1977, 39.1, 1-22.
Diggle, P.J., Heagerty, P., Liang, K.Y., Zeger, S.L., Analysis of longitudinal data. Oxford University Press, 2002.
Efron, B., Nonparametric estimates of standard error: the jackknife, the bootstrap and other methods. Biometrika, 1981, 68.3, 589-599.
Efron, B., The jackknife, the bootstrap and other resampling plans. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA, 1982.
Engelhard, M., Henao, R., Disentangling whether from when in a neural mixture cure model for failure time data. In International Conference on Artificial Intelligence and Statistics, 2022, 9571-9581. https://proceedings.mlr.press/v151/engelhard22a.html.
Fan, J., Li, R., Variable selection for Cox's proportional hazards model and frailty model. The Annals of Statistics, 2002, 30.1, 74-99.
Farewell, V.T., A model for a binary variable with time-censored observations. Biometrika, 1977, 64.1, 43-46.
Farewell, V.T., The use of mixture models for the analysis of survival data with long-term survivors. Biometrics, 1982, 1041-1046.
Fu, H., Nicolet, D., Mrózek, K., Stone, R.M., Eisfeld, A. K., Byrd, J. C., Archer, K.J., Controlled variable selection in Weibull mixture cure models for high-dimensional data. Statistics in Medicine, 2022, 41.22, 4340-4366.
Gamel, J.W., McLean, I.W., Rosenberg, S.H., Proportion cured and mean log survival time as functions of tumor size. Statistics in Medicine, 1990, 9.8, 999-1006.
Ghitany, M.E., Maller, R.A., Asymptotic results for exponential mixture models with long-term survivors. Statistics: A Journal of Theoretical and Applied Statistics, 1992, 23.4, 321-336.
Giunchiglia, E., Nemchenko, A., van der Schaar, M., RNN-SURV: A deep recurrent model for survival analysis. Artificial Neural Networks and Machine Learning-ICANN 2018, 2018. http://medianetlab.ee.ucla.edu/papers/RNN_SURV.pdf
Goldman, A.I., Survivorship analysis when cure is a possibility: A Monte Carlo study. Statistics in Medicine, 1984, 3.2, 153-163.
Gordon, N.H., Maximum likelihood estimation for mixtures of two gompertz distributions when censoring occurs. Communications in Statistics - Simulation and Computation, 1990, 19.2, 733-747.
Hanin, L., Huang, L.S., Identifiability of cure models revisited. Journal of Multivariate Analysis, 2014, 130, 261-274.
Hansen, L.P., Large sample properties of generalized method of moments estimators. Econometrica, 1982, 50.4, 1029-1054.
Hansen, L.P., Heaton, J., Yaron, A., Finite-sample properties of some alternative GMM estimators. Journal of Business & Economic Statistics, 1996, 14, 262-280.
Hansen, L.P., Singleton, K.J., Generalized instrumental variables estimation of nonlinear rational expectations models. Econometrica, 1982, 1269-1286.
Ji, W., Liu, D., Meng, Y., Xue, Y., A review of genetic-based evolutionary algorithms in SVM parameters optimization. Evolutionary Intelligence, 2021, 14, 1389-1414.
Jiang, C., Wang, Z., Zhao, H., A prediction-driven mixture cure model and its application in credit scoring. European Journal of Operational Research, 2019, 277, 20-31.
Jones, D.R., Powles, R.L., Machin, D., Sylvester, R.J., On estimating the proportion of cured patients in clinical studies. Biometrie-Praximetrie, 1981, 21, 1-11.
Kaplan, E.L., Meier, P., Nonparametric estimation from incomplete observations. Journal of the American Statistical Association, 1958, 53.282, 457-481.
Katzman, J.L., Shaham, U., Cloninger, A., Bates, J., Jiang, T., Kluger, Y., DeepSurv: personalized treatment recommender system using a Cox proportional hazards deep neural network. BMC Medical Research Methodology, 2018, 18.24. https://doi.org/10.1186/s12874-018-0482-1
Kumaraswamy, P., A generalized probability density function for double-bounded random processes. Journal of Hydrology, 1980, 46, 79-88.
Kuan, C.M., Generalized method of moment. Taiwan: Department of Finance & CRETA National Taiwan University, 2010.
Kuk, A.Y., Chen, C.H., A mixture model combining logistic regression with proportional hazards regression. Biometrika, 1992, 79.3, 531-541.
Kundu, P., Chatterjee, N., Logistic regression analysis of two-phase studies using generalized method of moments. Biometrics, 2023, 79.1, 242-252.
Lai, T. L., Small, D., Marginal regression analysis of longitudinal data with time-dependent covariates: a generalized method-of-moments approach. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 2007, 69.1, 79-99.
Lalonde, T. L., Nguyen, A. Q., Yin, J., Irimata, K., Wilson, J.R., Modeling correlated binary outcomes with time-dependent covariates. Journal of Data Science, 2013, 11.4, 715-738.
Larson, M.G., Dinse, G.E., A mixture model for the regression analysis of competing risks data. Journal of the Royal Statistical Society: Series C (Applied Statistics), 1985, 34.3, 201-211.
Lee, C., Zame, W.R., Yoon, J., van der Schaar, M., DeepHit: A deep learning approach to survival analysis with competing risks. Proceedings of the AAAI Conference on Artificial Intelligence, 2018, 32.1. https://doi.org/10.1609/aaai.v32i1.11842
Li, C.S., Taylor, J.M., Sy, J.P., Identifiability of cure models. Statistics & Probability Letters, 2001, 54.4, 389-395.
Li, P., Peng, Y., Jiang, P., Dong, Q., A support vector machine based semiparametric mixture cure model. Computational Statistics, 2020, 35.3, 931-945.
Liang, H., Wang, X., Peng, Y., Niu, Y., Improving marginal hazard ratio estimation using quadratic inference functions. Lifetime Data Analysis, 2023, 29.4, 823-853. doi:10.1007/s10985-023-09598-4
Liang, K.Y., Zeger, S.L., Longitudinal data analysis using generalized linear models. Biometrika, 1986, 73.1, 13-22.
Liu, X., Peng, Y., Tu, D., Liang, H., Variable selection in semiparametric cure models based on penalized likelihood, with application to breast cancer clinical trials. Statistics in Medicine, 2012, 31.24, 2882-2891.
Liu, Y., Li, C.S., A linear spline Cox cure model with its applications. Computational Statistics, 2023, 38.2, 935-954.
Louis, T.A., Finding the observed information matrix when using the EM algorithm. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 1982, 44.2, 226-233.
Martinsson, E., WTTE-RNN: Weibull time to event recurrent neural network. Master's thesis, 2017, Department of Computer Science and Engineering, Chalmers University of Technology, University of Gothenburg.
Niu, Y., Peng, Y., Marginal regression analysis of clustered failure time data with a cure fraction. Journal of Multivariate Analysis, 2014, 123, 129-142.
Othus, M., Barlogie, B., LeBlanc, M.L., Crowley, J.J., Cure models as a useful statistical tool for analyzing survival. Clinical Cancer Research, 2012, 18.14, 3731-3736.
Pal, S., Peng, Y., Barui, S., Wang, P., A support vector machine based cure rate model for interval censored data. arXiv preprint arXiv:2109.01098, 2021. https://arxiv.org/pdf/2109.01098. pdf.
Parsa, M., Van Keilegom, I., Accelerated failure time vs Cox proportional hazards mixture cure models: David vs Goliath?. Statistical Papers, 2023, 64, 835-855.
Peng, Y., Fitting semiparametric cure models. Computational Statistics & Data Analysis, 2003, 41, 481-490.
Peng, Y., Dear, K.B., A nonparametric mixture model for cure rate estimation. Biometrics, 2000, 56.1, 237-243.
Peng, Y., Dear, K.B., Denham, J.W., A generalized F mixture model for cure rate estimation. Statistics in Medicine, 1998, 17.8, 813-830.
Pepe, M.S., Anderson, G.L., A cautionary note on inference for marginal regression models with longitudinal data and general correlated response data. Communications in Statistics-Simulation and Computation, 1994, 23.4, 939-951.
Platt, J., Probabilistic outputs for support vector machines and comparisons to regularized likelihood methods. Advances in Large Margin Classifiers, 1999, 10.3, 61-73.
Qu, A., Lindsay, B.G., Li, B., Improving generalised estimating equations using quadratic inference functions. Biometrika, 2000, 87.4, 823-836.
Rosen, O., Jiang, W., Tanner, M.A., Mixtures of marginal models. Biometrika, 2000, 87.2, 391-404.
Schwarz, G., Estimating the dimension of a model. The annals of statistics, 1978, 6.2, 461-464.
Su, C.L., Chiou, S.H., Lin, F.C., Platt, R.W., Analysis of survival data with cure fraction and variable selection: A pseudo-observations approach. Statistical Methods in Medical Research, 2022, 31.11, 2037-2053.
Therneau, T., Grambsch, P., Modeling survival data: Extending the Cox Model. Springer-Verlag, New York. ISBN: 0-387-98784-3, 2000.
Wedderburn, R.W., Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika, 1974, 61.3, 439-447.
Wei, L.J., The accelerated failure time model: A useful alternative to the cox regression model in survival analysis. Statistics in Medicine, 1992, 11.14-15, 1871-1879.
Yakovlev, A.Y., Tsodikov, A.D., Stochastic models of tumor latency and their biostatistical applications. World Scientific, New Jersey, 1996.
Yamaguchi, K., Accelerated failure-time regression models with a regression model of surviving fraction: an application to the analysis of permanent employment in Japan. Journal of the American Statistical Association, 1992, 87.418, 284-292.
Yang, Y.H., Ma, M.C. A comparison on different mixed cured model with the Cox proportional hazard function. Master thesis, National Cheng Kung University, Taiwan, ROC, 2017.
Yin, J., Correlated GMM Logistic regression models with time-dependent covariates and valid estimating equations. Arizona State University, 2012.
Yu, B., Peng, Y., Mixture cure models for multivariate survival data. Computational Statistics & Data Analysis, 2008, 52.3, 1524-1532.
Zadrozny, B., Elkan, C., Transforming classifier scores into accurate multiclass probability estimates. In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, 2002, 694-699.
Zhou, Y., Lefante, J., Rice, J., Chen, S., Using modified approaches on marginal regression analysis of longitudinal data with time-dependent covariates. Statistics in Medicine, 2014, 33.19, 3354-3364.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE