:::

詳目顯示

回上一頁
題名:高階層試題反應理論模式延伸與應用
作者:蘇啟明
作者(外文):Su, Chi-Ming
校院名稱:國立中正大學
系所名稱:心理學研究所
指導教授:王文中
陳淑英
學位類別:博士
出版日期:2011
主題關鍵詞:高層次試題反應理論模式貝氏估計法電腦適性測驗新近選題法試題曝光率higher-order multidimensional IRT modelBayesian estimationcomputerized adaptive testingmodern item selection rulesitem exposure rate
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:54
本研究旨在擴展高層次潛在能力結構之多向度試題反應模式,並且將其應用在電腦適性測驗中,並檢驗其有效性。本論文共有三個研究:
第一個研究是透過貝氏統計中的馬可夫鍊蒙地卡羅估計法,來進行模式參數的估計回復性的檢驗,結果發現本研究所擴展的高階層試題反應理論模式,其偏誤(bias)及平方根誤差(RMSE)都很小,而且大部分絕對相對誤差(ARB)也都小於.05,指出貝氏估計法所評估的模式具有良好的模式參數回復,WinBUGS是一個合適的估計軟體。
第二個研究則是在階層結構資料下,比較幾種模式的配適度,結果發現PsBF及DIC兩個指標相當敏銳且可以比較出最合適於階層資料的模式,並透過實際資料的分析,說明高層次試題反應理論模式的應用。
第三個研究則是在多向度電腦適性測驗下,分析比較四種估計法,結果證明高層次CAT估計法最佳,並修正傳統的最大訊息量選題法,在測驗初期加上隨機成分來控制測驗初期能力估計的誤差,結果發現新近的選題方法能提高題庫使用率,降低試題的曝光率與測驗平均重疊率,支持新的選題法可以提高題庫安全性與保持較佳之測量精確度。最後,作者針對未來研究與實務應用提供若干建議。
The purpose of this study is to extend the higher-order latent trait structure within the multidimensional IRT framework and implement the model in the CAT context, then to assess the effectiveness of various modern item selection rules under the higher-order MCAT procedure. Sheng and Wikle (2008) proposed the MIRT models with a hierarchical structure, Huang (2009) then modified the Sheng and Wikle (2008) MIRT models and applied the proposed hierarchical MIRT models for the CAT procedure. The researcher further extended both these hierarchical MIRT models to the generalized higher-order models. Three studies based on the higher-order MIRT models were explored and discussed.
In the first study, the researcher employed the 2PLM and GPCM models, and considered the hierarchical structure focus on the 2nd-order, 3rd-order hierarchy in the higher-order MIRT models. Through simulation studies, the parameter recovery for the higher-order MIRT models was evaluated. The results showed the magnitudes of bias and RMSE were generally small, indicating the parameters could be recovered fairly well. Most of the ARBs were smaller than .05, which indicated the estimations had a high precision. The WinBUGS software was a suitable analysis program for the hierarchical structure of latent traits.
In the second study, when the model comparison was conducted, the results concurred with the expectation that the DIC would favor the generating model best, independent of the dichotomous or polytomous items, and high or diverse factor loadings. But the PsBF indicator favored the generating model only under the diverse factor loadings. In the real data analysis, three examples illustrated the application of the higher-order model including dichotomous and polytomous items. They showed the 2nd-order latent trait of the higher-order model was more representative for the true ability or performance of subjects.
In the last study, the applications of the higher-order MCAT with four CAT algorithms including direct and indirect approaches were implemented to estimate all latent traits for the duration of CAT. The higher-order CAT approach was the best approach because it had the smallest RMSEs for all kinds of ability estimates and could provide the conditional standard error for the general ability. When the MCAT was compared with the three item selection rules; the point Fisher information (PFI), progressive (PG) and alpha-stratified (AS) methods; the PG method reduced the item exposure rate as well as improved the item pool usage and the effect became larger as the acceleration parameter increased independent of estimation methods. Whereas, the AS method only slightly reduced the item exposure rate as well as improving the item pool usage. Both the PG and AS methods did not guarantee that the item exposure rate would be below a pre-specified level with the maximum item exposure rate set to .2, but it did so when the Sympson and Hetter online freeze (SHOF) was incorporated into the item selection rules. The content balancing and SHOF controls also had significant effect on reducing item exposure rate, especially the SHOF control method. For improving item pool usage, only the PG method accompanied with the SHOF procedure were effective.
In regard to the measurement precision and test mean overlap rate, the overall RMSEs of the higher-order CAT approach were lower than the other two estimate approaches, but the test overlap rates were almost the same as the other two approaches. When compared to the PFI item selection method, the PG method significantly reduced the test mean overlap rate, independent of pool sizes and test lengths, but the AS method only slightly reduced the test mean overlap rate. The PG and the AS methods resulted in a lower the test mean overlap rate when the value of the acceleration parameter and a-stratified number became higher. When incorporating the SHOF into CAT, the test mean overlap rate decreased for almost all conditions, and the effect of SHOF on the overall RMSEs was much smaller. In comparison with the conditions of a high ability correlation, the overall RMSEs increased, but the test mean overlap rate did not shift much with a diverse ability correlation.
In summary, the higher-order CAT approach with both the PG and SHOF procedures improved the item bank security and provided test information for the duration of CAT. Finally, the study limitations and further study suggestions were proposed by the author.
References
Ackerman, T. A. (1991). The use of unidimensional parameter estimates of multidimensional items in adaptive testing. Applied Psychological Measurement, 13, 113-127.
Ackerman, T. A., & Davey, T. C. (1991). Concurrent adaptive measurement of multiple abilities. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In B. N. Petrov & F. Csa´ki (Eds.), Proceedings of the 2nd International Symposium on Information Theory (pp. 267-281). Budapest: Akade´miai Kiado.
Al-Turkait, F. A., & Ohaeri, J. U. (2010). Dimensional and hierarchical models of depression using the Beck Depression Inventory-II in an Arab college student sample. BMC Psychiatry, electronic version from: http://www.biomedcentral.com/1471-244X/10/60
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Andrich, D. (1996). A hyperbolic cosine latent trait model for unfolding polytomous responses: Reconciling Thurstone and Likert methodologies. British Journal of Mathematical and Statistical Psychology, 49, 347-365
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness in the Fisher information for improving item-exposure control in CATs. British Journal of mathematical and Statistical Psychology, 61(2), 493-513.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2009). Item bank disclosure in computerized adaptive testing: What makes an item selection rule safer? Manuscript submitted for publication.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.new window
Bock, R. D., & Aitken, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2001). A mixture item response model for multiple-choice data. Journal of Educational and Behavioral Statistics, 26, 381-409.
Bolt, D. M. & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 29, 395-414.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlet. Psychometrika, 64, 153-168.
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Application and data analysis methods. Newbury Park, CA: Sage.
Cao, J., & Stokes, S. L. (2008). Bayesian IRT guessing models for partial guessing behaviors. Psychometrica, 73, 209-230.
Chang, H.-H., & Ansley, T. N. (2003). A comparative study of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 40, 71-103.
Chang, H.-H., Qian, J., & Ying, Z. (2001). A-stratified multistage Computerized Adaptive Testing with b blocking. Applied Psychological Measurement, 25, 333-341.
Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229.
Chang, H. H., & Ying, Z. (1999). A-stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222.
Chen, S.-Y. (2004). Controlling item exposure on the Fly in Computerized Adaptive Testing. Paper presented at the Annual Meeting of the Taiwanese Psychological Association, Taipei, Taiwan.
Chen, S.-Y. (2005). Controlling item exposure and test overlap on the Fly in computerized adaptive testing. Paper presented at the IMPS 2005 Annual Meeting of the Psychometric Society. Tilburg, Netherlands.new window
Chen, S.-Y., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing. Journal of Educational Measurement, 41, 149-174.
Chen, S.-Y., Ankenmann, R. D., & Chang, H. -H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24, 241-255.
Chen, S.-Y., Ankenmann, R. D., & Spray, J. A. (2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement, 40, 129-145.new window
Chen, S.-Y., & Lei, P.-W. (2005). Controlling item exposure and test overlap in computerized adaptive testing. Applied Psychological Measurement, 29, 204-217.new window
Chen, F. F., West, S. G., & Sousa, K. H. (2006). A comparison of bifactor and second-order models of quality of life. Multivariate Behavioral Research, 41, 189-225.
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. American Statistician, 49, 327-335.
Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42, 133-148.
Costa, P. T., Jr., & McCrae, R. R. (1992). NEO PI-R professional manual. Odessa, FL: Psychological Assessment Resources, Inc.
Davey, T., & Parshall, C. G. (1995). New algorithms for item selection and exposure control with computerized adaptive testing. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559.
De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait for cognitive diagnosis. Psychometrika, 69, 333-353.
de la Torre, J., & Hong, Y. (2010). Parameter estimation with small sample size: A higher-order IRT model approach. Applied Psychological Measurement, 34, 267-285.
de la Torre, J., & Patz, R. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30, 295-311.new window
de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model Approach. Applied Psychological Measurement, 33(8), 620-639.
de la Torre, J., Song, H., & Hong, Y. (2011). A Comparison of Four Methods of IRT Subscoring. Applied Psychological Measurement. 35(4), 296-316.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Erlbaum.
Fischer, G. H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.new window
Fischer, G. H. (1983). Logistic latent trait models with linear constraints. Psychometrika, 48, 3-26.
Fischer, G. H, & Ponocny, I. (1994). An extension of the partial credit model with an application to the measurement of change. Psychometrika, 59, 177-192.
Flaugher, R. (2000). Item pools. In Wainer, H. (Ed), Computerized adaptive testing: A primer (2nd ed.) (pp.37-59). Mahwah, NH: Lawrence Erlbaum Associates.
Flora, D. B., Finkel, E. J., & Foshee, V. A. (2003). Higher order factor structure of a self-control test: evidence from confirmatory factor analysis with polychoric correlations. Educational and Psychological Measurement, 63(1), 112-127.new window
Geisser, S., & Eddy, W. (1979). A predictive approach to model selection. Journal of American Statistical Association, 74, 153-160.
Gelfand, A. E., Dey, D. K., & Chang, H. (1992). Model determination using predictive distributions with implementation via sampling-based methods. In J. M. Bernardo, J. O. Berger, A. P. Dawid, & A. F. M. Smith (Eds.), Bayesian statistics (Vol. 4, pp. 147-167). Oxford, UK: Oxford University Press.
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. AM. Statist. Assoc., 85, 398-409.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (1996). Bayesian data analysis. London: Chapman & Hall.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE trans. Pattern Analysis and Machine Intelligence, 12, 609-628.
Georgiadou, E., Triantafillou, E., & Economides, A. (2007). A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005. Journal of Technology, Learning, and Assessment, 5(8). Retrieved February 17, 2009, from http://escholarship.bc.edu/jtla/vol5/8/.
Gilks, W. R., Richardson, S., & Spiegelhalter, D. J. (1998). Markov chain Monte Carlo in practice. London: Chapman & Hall.
Gignac, G. E. (2008). Higher-order models versus direct hierarchical models: g as superordinate or breadth factor? Psychology Science Quarterly, 50, 21-43.
Goegebeur, Y., De Boeck, P., Wollack, J., Cohen, A. (2008). A speeded item response model with gradual process change. Psychometrika, 73(1), 65-87.
Gottfredson, M. R., & Hirschi, T. (1990). A general theory of crime. Stanford, CA: Stanford University Press.
Grasmick, H. G., Tittle, C. R., Bursik, R. J., & Arneklev, B. J. (1993). Testing the core empirical implications of Gottfredson and Hirschi’s general theory of crime. Journal of Research in Crime and Delinquency, 30, 5-29.
Grilli, L., & Rampichini, C. (2007). Multilevel factor models for ordinal variables. Structural Equation Modeling, 14, 1-25.
Gustafsson, J., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28, 407-434.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Boston: Kluwer-Nijhoff.
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.
Hau, K. T., & Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first. Journal of Educational Measurement, 38(3), 249-266.
Holzinger, K. J., & Swineford, F. (1937). The bi-factor method. Psychometrika, 2, 41-54.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-analysis. Sociological Methods and Research, 26, 329-367.
Hsu, C.-L. & Chen, S.-Y. (2007). Controlling item exposure and test overlap in variable length computerized adaptive testing. Psychological testing, 54(2), 403-428.new window
Huang, H.-Y. (2009). Hierarchical Structure Multidimensional IRT Models and its Application to Computerized Adaptive Testing. Unpublished doctoral thesis, National Taiwan Normal University, Taipei, Taiwan.
Ip, E. H.-S. (2000). Adjusting for information inflation due to local dependence in moderately large item clusters. Psychometrika, 65, 73-91.
Joe, S., Woolley, M.E., Brown, G.K., Ghahramanlou-Holloway, M., & Beck, A.T. (2008). Psychometric properties of the Beck Depression Inventory-II in low income African-American suicide attempters. Journal of Personality Assessment, 90(5), 521-523.
Johnson, D. E. (1998). Applied multivariate methods for data analysts. CA: Brooks/Cole Publishing Company.
Johnson, M. E., Neal, D. B., Brems, C., & Fisher, D. G. (2006). Depression as measured by the Beck Depression Inventory-II among injecting drug users. Assessment, 13(2) 168-177.
Ju, Y. (2005). Item exposure control in a-stratified computerized adaptive testing. Unpublished master’s thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Kahraman, N., & Kamata, A. (2004). Increasing the precision of subscore by using out-of-scale information. Applied Psychological Measurement, 28, 407-426.
Kang, T., & Cohen, A. S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31, 331-358.
Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, 155-168.
Kelloway, E. K. (1998). Using Lisrel for structural Equation modeling: A researcher’s guide. Thousand Oaks: Sage Publications.
Kelly, T. L. (1927). The interpretation of educational measurement. New York: World Book.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.
Kingsbury, G. G., & Zara, A. R. (1991). A comparison of procedures for content-sensitive item selection in computerized adaptive tests. Applied Measurement in Education, 4, 241-261.
Klein Entink, R. H., Fox, J.-P, & van der Linden W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21-48.new window
Kumar G, Rissmiller D. J., Steer R. A., Beck A. T. (2006) Mean Beck Depression Inventory-II total scores by type of bipolar episode. Psychological Reports, 98(3), 836-840.
Lee, S.-Y., Song, X.-Y., & Tang, N.-S. (2007). Bayesian methods for analyzing structural equation models with covariates, interaction, and quadratic latent variables. Structural Equation Modeling, 14, 404-434.
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological measurement, 30(1), 3-21.
Li, Y. H., & Schafer, W. D. (2005). Increasing the homogeneity of CAT's item-exposure rates by minimizing or maximizing varied target functions while assembling shadow tests. Journal of Educational Measurement, 42, 245-269.
Linacre, J. M. (1989). Many-faceted Rasch measurement. Chicago: MESA.
Liu, K.-S., Cheng, Y.-Y., & Wang, W.-C. (2007). Rasch analysis of the Beck Depression Inventory-II with Taiwan university students. Paper presented at 2007 Pacific Rim Objective Measurement Symposium. National College of Physical Education & Sports, Taoyuan, Taiwan.
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillside, NJ: Lawrence Erlbaum.
Luo, G. (1998). A general formulation for unidimensional unfolding and pairwise preference models: Making explicit the latitude of acceptance. Journal of Mathematical Psychology, 42, 400-417.
Marsh, H. W., & Hocevar, D. (1985). Application of confirmatory factor analysis to the study of self-concept: First and higher order factor structures and their invariance across groups. Psychological Bulletin, 97, 562- 582.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Mckinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82-1). Iowa City IA: American College Testing.
Mckinley, R. L., & Reckase, M. D. (1983). MAXLOG; A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods & Instrumentation, 15, 389-390.
Mislevy, R. J. (1987). Exploiting auxiliary information about examinees in the estimation of item parameters. Applied Psychological Measurement, 11, 81-91.
Mislevy, R. J., & Sheehan, K. M. (1989). The role of collateral information about examinees in item parameter estimation. Psychometrika, 54, 661-679.
Muraki, R. J. (1992). A generalized partial credit model: Application of an EM-algorithm. Applied Psychological Measurement, 16, 159-176.
Muraki, R. J. (1993). Information functions of the generalized partial credit model. Applied Psychological Measurement, 17, 351-363.
Muthén, B. O., & Asparouhov, T. (2009). Multilevel regression mixture analysis. Journal of the Royal Statistical Society, series A , 172, 639-657.
Nering, M. L., Davey, T., & Thompson, T. (1998). A hybrid method for controlling item exposure in computerized adaptive testing. Paper presented at the annual meeting of the Psychometric Society, Urbana, IL.
O’Hagan, A. (1995). Fractional Bayes factors for model comparison. Journal of the Royal Statistical Society B, 57, 99-138.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351-356.
Parshall, C. G., Davey, T., & Nering, M. L. (1998). Test development exposure control of adaptive testing. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.new window
Patz, R .J. & Junker, B. W. (1999). A straightforward approach to Markov Chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24(2), 146-178.
Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 163-187). Washington, DC: Chapman & Hall.
Raftery, A. E., & Lewis, S. M. (1996). Implementing MCMC. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 115-130). London: Chapman & Hall.
Raîche, G., Blais, J. G., & Magis, D. (2007). Adaptive estimatiors of trait level in adaptive testing: some proposals. In D. J. Weiss (Ed.), Proceedings of the 2007 GMAC conference on Computerized Adaptive Testing. 2007. June 7-8.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research.
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401-412.new window
Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory. New York: Springer.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Applied Psychological Measurement, 114, 552-566.
Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 35, 311-327.
Roberts, J. S., Donoghue, J. R., & Laughlin, J. E. (2000). A general item response theory model for unfolding unidimensional polytomous responses. Applied Psychological Measurement, 24, 3-32.
Ryan, J. J., & Schnakenberg-Ott, S. D. (2003). Scoring reliability on the Wechsler Adult Intelligence Scale-Third Edition (WAIS-III). Assessment, 10, 151-9.
Samejima, F. (1969). Estimation of Latent Ability Using a Response Pattern of Graded Scores (Psychometric Monograph No. 17). Richmond, VA: Psychometric Society. Retrieved from http: //www.psychometrika.org/journal/online/MN17.pdf
San Martin, E., del Pino, G., & Boeck, P, D. (2006). IRT models for ability-based guessing. Applied Psycjological Measurement, 30, 193-203.
SAS Institute (1999). SAS online doc (version 8) (software manual on CD-Rom). Cary, NC: SAS Institute Inc.
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53-61.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
Segall, D. O. (2004). A sharing item response theory model for computerized adaptive testing. Journal of Educational and Behavioral Statistics, 29, 439–460.
Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and psychological measurement, 68, 413-430.
Shih, C.-L. (2007). A Comparison of Item Selection Strategies in Computerized Adaptive Testing for Testlet-based Items and Multidimensional Items. Unpublished doctoral thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological measurement, 30(4), 298-321.
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural Equation models. Boca Raton, FL: Chapman and Hall/CRC Press.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society B, 64, 583-640.
Spiegelhalter, D., Thomas, A., & Best, N. (2003). WinBUGS version 1.4 [Computer program]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Stocking, M. L., & Lewis, C. (1995). A new method for controlling item exposure in computerized adaptive testing (ETS Research Report RR-95-25). Princeton, NJ: Educational Testing Service.
Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57-75.
Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17(3), 277-292.
Su, Y.-H. (2007). Simultaneous Control over Item Exposure and Test Overlap in Computerized Adaptive Testing for Testlet-based Items and Multidimensional Items. Unpublished doctoral thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Swanson, L., & Stocking, M. L. (1993). A model and heuristic for solving very large item selection problems. Applied Psychological Measurement, 17(2), 151-166.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973-977). San Diego, CA: Navy Personnel Research and Development Center.
The WHOQOL Group (1998). The World Health Organization Quality of Life Assessment (WHOQOL): Development and general psychometric properties. Social Science and Medicine, 46, 1569-1585.
Thomasson, G. L. (1995). New item response control algorisms for computerized adaptive testing. Paper presented at the annual meeting of the Psychometric Society, Minneapolis, MN.
Tierney, L. (1994). Exploring posterior distributions with Markov Chains. Annals of Statistics, 22, 1701-1762.
van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 63, 201-216.
van der Linden, W. J., & Glas, C. (Eds.). (2000). Computer adaptive testing: Theory and practice. Boston, MA: Kluwer Academic Publishers.
van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 1-25). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics, 22, 203-226.
Wainer, H., Bradlow, E. T., & Wang, X. (2007). Testlet response theory and its applications. NY: Cambridge University Press.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22-29.new window
Wainer, H., Vevea, J. L., Camacho, F., Reeve, B., Rosa, K., Nelson, L., Swygert, K., & Thissen, D. (2001). Augmented scores – “borrowing strength” to compute scores based on small number of items. In D. Thissen & H. Wainer (Eds.), Testing scoring (pp. 343-387). Mahwah, NJ: Lawrence Erlbaum Associates.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136.new window
Wang, W.-C., Cheng, Y.-Y. & Wilson, M. R. (2005). Local item dependency for items across tests connected by common stimuli. Educational and Psychological measurement, 65, 5-27.
Wang, W.-C & Liu, C,-Y. (2007). Formulation and Application of the Generalized Multilevel Facets Model. Educational and Psychological Measurement, 67, 583-605.
Wang, W.-C., & Wilson, M. R. (2005a). Assessment of differential item functioning in testlet-based items using the Rasch testlet model. Educational and Psychological Measurement, 65, 549-576.
Wang, W.-C., & Wilson, M. R. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29, 126-149.
Wang, W.-C., & Wilson, M. R. (2005c). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29, 296-318.
Wang, W., Wilson, M., & Adams, R. J. (1997). Rasch models for multidimensionality between items and within items. In M. Wilson & G. Engelhard (Eds.), Objective measurement: Theory into practice (Vol. 4, pp. 139-155). Norwood, NJ: Ablex Publishing.
Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement: Issues and Practice, 17, 17-27.
Wilson, M. (1992). The ordered partition model: an extension of the partial credit model. Applied Psychological Measurement, 16, 309-325.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest: Generalised item response modeling software. Melbourne, Australia: Australian Council for Educational Research.
Wu, M.-L., & Chen, S.-Y. (2008). Investigating item exposure control on the FLY in Computerized Adaptive Testing. Psychological testing, 55(1), 1-32.
Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83-105.
Yen, W. M. (1987). A Bayesian / IRT index of objective performance. Paper presented at the annual meeting of the Psychometric Society, Montreal, Quebec, Canada, June 1-19.
Yung, Y.-F., Thissen, D., & McLeod, L. D. (1999). On the relationship between the higher-order factor model and the hierarchical factor model. Psychometrika, 64, 113-128.


 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關書籍
 
無相關著作
 
QR Code
QRCODE