Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
Adams, R. J, Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to error in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.
Adams, R. J. & Wu, M. L. (Eds.). (2002). PISA 2000 technical report. Paris, OECD Publications.
Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17, 251-269.
Akaike, H. (1974). A new look at the statistical model identication. IEEE Transactions on Automatic Control, 19, (6), 716-723.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 163-169.
Baker, F., & Kim, S.-H., (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness in the Fisher information for improving item-exposure control in CATs. British Journal of Mathematical and Statistical Psychology, 61(2), 493-513.
Bayarri, S., & Berger, J. (2000). P-values for composite null models. Journal of the American Statistical Association, 95, 1127–1142.
Beguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–562.
Birubaum, A. (1968). Some latent trait models and their use in inferring an examinees’ ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.![new window](/gs32/images/newin.png)
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444
Bolt, D.M., Cohen, A.S., & Wollack, J.A. (2001). A mixture model for multiple choice data. Journal of Educational and Behavioral Statistics, 26(4), 381-409.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331–348.![new window](/gs32/images/newin.png)
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 29, 395-414.
Bond, T., & Fox, C. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlet. Psychometrika, 64, 153-168.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322.
Cao, J., & Stokes, S. L. (2008). Bayesian IRT guessing models for partial guessing
behaviors. Psychometrica, 73, 209-230.
Chang, S.-W., & Ansley, T. N. (2003). A comparative study of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 40, 71-103.
Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229.
Chen, S.-Y. (2004). Controlling item exposure on the Fly in Computerized Adaptive Testing. Paper presented at the Annual Meeting of the Taiwanese Psychological Association, Taipei, Taiwan.
Chen, S.-Y. (2005). Controlling item exposure and test overlap on the Fly in computerized adaptive testing. Paper presented at the IMPS 2005 Annual Meeting of the Psychometric Society. Tilburg, Netherlands.![new window](/gs32/images/newin.png)
Chen, S.-Y., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing. Journal of Educational Measurement, 41, 149-174.
Chen, S.-Y., Ankenmann, R. D., & Chang, H.-H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24, 241-255.
Chen, S.-Y., Ankenmann, R. D., & Spray, J. A. (2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement, 40, 129–145.
Chen, S.-Y., & Lei, P.-W. (2005). Controlling item exposure and test overlap in computerized adaptive testing. Applied Psychological Measurement, 29, 204–217.![new window](/gs32/images/newin.png)
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. American Statistician, 49, 327-335.
Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differentail item functioning. Journal of Educational Measurement, 42, 133-148.
Davey, T., & Parshall, C. G. (1995). New algorithms for item selection and exposure control with computerized adaptive testing. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Flaugher, R. (2000). Item pools. In Wainer, H. (Ed), Computerized adaptive testing: A primer (2nd ed.) (pp. 37-59). Mahwah, NH: Lawrence Erlbaum Associates.
Fischer, G. H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.![new window](/gs32/images/newin.png)
Fox, J.-P., & Glas, C. A. W. (2003). Bayesian modeling of measurement error in predictor varables using item response theory. Psychometrika, 68, 169-191.
Gelfand, A. E. (1996). Model comparison using sampling-based methods. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 145-161). Washington, DC: Chapman & Hall.
Gelfand, A. E., & Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society, B, 56, 501-514.
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. AM. Statist. Assoc., 85, 398-409.
Geisser, S., & Eddy, W. (1979). A predictive approach to model selection. Journal of American Statistical Association, 74, 153-160.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis. New York: Chapman & Hall.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE trans. Pattern Analysis and Machine Intelligence, 12, 609-628.
Goegebeur, Y., De Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrica, 73, 65-87.
Gustafsson, J., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28(4), 407-434.
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society B, 29, 83-100.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Norwell, MA: Kluwer Academic Publishers.
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.
Hoijtink, H., & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171-189.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-analysis. Sociological Methods and Research, 26, 329–367.
Hsu, C.-L. & Chen, S.-Y. (2007). Controlling item exposure and test overlap in variable length computerized adaptive testing. Psychological testing, 54(2), 403-428.![new window](/gs32/images/newin.png)
Ip, E. H.-S. (2000). Adjusting for information inflation due to local dependence in moderately large item clusters. Psychometrika, 65, 73-91.
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25(3), 285-306.
Johnson, D. E. (1998). Applied multivariate methods for data analysts. CA: Brooks/Cole Publishing Company.
Johnson, V. E., & Albert, J. H. (1999). Ordinal data modeling. New York: Springer-Verlag.
Jöreskog, K. G., & Sörbom, D. (2001). LISREL Version 8.51[Computer software]. Chicago: Scientific Software International.
Ju, Y. (2005). Item exposure control in a-stratified computerized adaptive testing. Unpublished master’s thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Kang, T., & Cohen, A. S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31, 331-358.
Kelloway, E. K. (1998). Using Lisrel for structural equation modeling: A researcher’s guide. Thousand Oaks: Sage Publications.
Klein Entink, R. H., Fox, J.-P, & van der Linden W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21-48.![new window](/gs32/images/newin.png)
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.
Kingsbury, G. G., & Zara, A. R. (1991). A comparison of procedures for content-sensitive item selection in computerized adaptive tests. Applied Measurement in Education, 4, 241-261.
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological measurement, 30(1), 3-21.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New Horizons in Testing (pp. 223-226). New York, NY: Academic Press.
McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: Wiley.
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82-1). Iowa City IA: American College Testing.
Nering, M. L., Davey, T., & Thompson, T. (1998). A hybrid method for controlling item exposure in computerized adaptive testing. Paper presented at the annual meeting of the Psychometric Society, Urbana, IL.
Newton, M. A., & Raftery, A. E. (1994). Approximate Bayesian inference by the weighted likelihood bootstrap (with discussion). Journal of the Royal Statistical Society, Series B, 56, 3-48.
O’Hagan, A. (1991). Discussion on posterior Bayes factors (by M. Aitkin), Journal of the Royal Statistical Society, Series B, 53, 136.
O’Hagan, A. (1995). Fractional Bayes factors for model comparison. , Journal of the Royal Statistical Society, Series B, 57, 99-138.
Owen, R. J. (1969). A Bayesian approach to tailored testing (Research Report 69-92). Princeton, NJ: Educational Testing Service.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.
Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2002). Practical considerations in computer-based testing. New York: Springer-Verlag.
Patz, R., & Junker, B. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146–178.
Press, S. J. (2003). Subjective and objective Bayesian statistics: Principle, models, and applications (Second Edition). Hoboken, NJ: John Wiley & Sons, Inc.
Ponsoda, V., & Olea, J. (2003). Adaptive and tailored testing. (Including IRT and Non IRT Application). In R. Fernandez-Ballesteros (Ed.), Encyclopaedia of psychological assessment (pp. 9–13). London: Sage Publications.
Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 163-187). London: Chapman & Hall.
Raftery, A. E., & Lewis, S. M. (1996). Implementing MCMC. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 115-130). London: Chapman & Hall.
Raîche, G., Blais, J. G., & Magis, D. (2007). Adaptive estimatiors of trait level in adaptive testing: some proposals. In D. J. Weiss(Ed.), Proceedings of the 2007 GMAC conference on Computerized Adaptive Testing. 2007. June 7-8.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research. (Expanded edition, 1980. Chicago: The University of Chicago Press.)
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.![new window](/gs32/images/newin.png)
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25-36.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Applied Psychological Measurement, 114, 552-566.
Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in
computerized adaptive testing. Journal of Educational Measurement, 35, 311–327.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kupens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185-205.
Rubin, D.B. (1984). Bayesinly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151-1172.
SAS Institute (1999). SAS online doc (version 8) (software manual on CD-Rom). Cary, NC: SAS Institute Inc.
San Martín, E., del Pino, G., & De Boeck, P. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30(3), 193-203.
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53-61.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
Segall, D. O. (2004a). A sharing item response theory model for computerized adaptive testing. Journal of Educational and Behavioral Statistics, 29, 439–460.
Segall, D. O. (2004b). Computerized adaptive testing. In K. Kempf-Leonard (Ed.), The encyclopaedia of social measurement (pp. 429-438). San Diego, CA: Academic Press.
Segall, D. O., & Moreno, K. E. (1999). Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery. In F. Drasgow, & J. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 35-65). Mahwah, NJ: Lawrence Erlbaum.
Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and psychological measurement, 68(3), 413-430.
Shih, C.-L. (2007). A Comparison of Item Selection Strategies in Computerized Adaptive Testing for Testlet-based Items and Multidimensional Items. Unpublished doctoral thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Sinharay, S. (2005). Assessing fit unidimensional item response theory models using a Bayesian approach. Journal of Educational Measurement, 42(4), 375-394.
Sinharay, S., & Johnson, M. S. (2003). Simulation studies applying posterior predictive model checking for assessing fit of the common item response theory models. Manuscript in preparation. A preliminary version Retrieved November 1, 2004, from http://www.ets.org/research/newpubs.html.
Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological measurement, 30(4), 298-321.
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman and Hall/CRC Press.
Smith, L. L., & Reise, S. P. (1998). Gender differences on negative affectivity: An IRT study of differential item functioning on the multidimensional personality questionnaire stress reaction scale. Journal of personality and social psychology, 75(5), 1350-1362.
Smith, E. V. Jr., & Smith, R. M. (Eds.). (2004). Introduction to Rasch measurement theory models and applications. Maple Grove, MN: JAM press.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271-295.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, Methodological, 64, 583-616.
Spiegelhalter, D. J., Thomas, A., & Best, N. (2003). WinBUGS version 1.4 [Computer Program.]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.
Stocking, M. L., & Lewis, C. (1995). A new method for controlling item exposure in computerized adaptive testing (ETS Research Report RR-95-25). Princeton, NJ: Educational Testing Service.
Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57-75.
Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17 (3), 277-292.
Su, Y.-H. (2007). Simultaneous Control over Item Exposure and Test Overlap in Computerized Adaptive Testing for Testlet-based Items and Multidimensional Items. Unpublished doctoral thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Swanson, L., & Stocking, M. L. (1993). A model and heuristic for solving very large item selection problems. Applied Psychological Measurement, 17 (2), 151-166.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973–977). San Diego, CA: Navy Personnel Research and Development Center.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. AM. Statist. Assoc., 82, 528-549.
Thomasson, G. L. (1995). New item response control algorisms for computerized adaptive testing. Paper presented at the annual meeting of the Psychometric Society, Minneapolis, MN.
van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 63, 201-216.
van der Linden, W. J., & Glas, C. (Eds.). (2000). Computer adaptive testing: Theory and practice. Boston, MA: Kluwer Academic Publishers.
van der Linden,W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.
van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 1-25). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics 22, 203-226.
Wainer, H. (Ed.) (1990). Computerized adaptive testing: A primer. Hilsdale, NJ: Lawrence Erlbaum Associates.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-186.![new window](/gs32/images/newin.png)
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245-269). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57, 741-758.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22-29.![new window](/gs32/images/newin.png)
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential
item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang W.-C, & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological measurement, 28(5), 295-316.
Wang, W.-C., Cheng, Y.-Y., & Wilson, M. R. (2005). Local item dependency for items across tests connected by common stimuli. Educational and Psychological measurement, 65, 5-27.
Wang, W.-C., & Liu, C.-Y. (2007). Formulation and Application of the Generalized Multilevel Facets Model. Educational and Psychological Measurement 67, 583-605.
Wang, W.-C., & Wilson, M. R. (2005a). Assessment of differential item functioning
in testlet-based items using the Rasch testlet model. Educational and Psychological Measurement, 65, 549-576.
Wang, W.-C., & Wilson, M. R. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29, 126-149.
Wang, W.-C., & Wilson, M. R. (2005c). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29, 296-318.
Wang, W.-C., Wilson, M. R., & Adams, R. J. (1997). Rasch models for multidimensionality between items and within items. In M. Wilson, G. Engelhard & K. Draney (Eds.), Objective measurement: Theory into practice (Volume 4, pp. 139-155). Norwood, NJ: Ablex.
Wang, W.-C., Wilson, M. R., & Adams, R. J. (2000). Interpreting the parameters of a multidimensional Rasch model. In M. Wilson, & G. Engelhard (Eds.), Objective measurement: Theory into practice (Volume 5, pp. 219-242). Norwood, NJ: Ablex.
Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement: Issues and Practice, 17, 17–27.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA
Press.
Wu, M.-L. (2006). Controlling Item Exposure on the Fly in Computerized Adaptive Testing. Unpublished master’s thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Wu, M.-L., & Chen, S.-Y. (2008). Investigating item exposure control on the FLY in Computerized Adaptive Testing. Psychological testing, 55(1), 1-32.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest: Generalised item response modeling software. Melbourne, Australia: Australian Council for Educational Research.