Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
Adams, R. J, Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to error in variables regression. Journal of Educational and Behavioral Statistics, 22, 47-76.
Adams, R. J. & Wu, M. L. (Eds.). (2002). PISA 2000 technical report. Paris, OECD Publications.
Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17, 251-269.
Akaike, H. (1974). A new look at the statistical model identication. IEEE Transactions on Automatic Control, 19, (6), 716-723.
Andrich, D. (1978). A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 163-169.
Baker, F., & Kim, S.-H., (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
Barrada, J. R., Olea, J., Ponsoda, V., & Abad, F. J. (2008). Incorporating randomness in the Fisher information for improving item-exposure control in CATs. British Journal of Mathematical and Statistical Psychology, 61(2), 493-513.
Bayarri, S., & Berger, J. (2000). P-values for composite null models. Journal of the American Statistical Association, 95, 1127–1142.
Beguin, A. A., & Glas, C. A. W. (2001). MCMC estimation and some model-fit analysis of multidimensional IRT models. Psychometrika, 66(4), 541–562.
Birubaum, A. (1968). Some latent trait models and their use in inferring an examinees’ ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6, 431-444
Bolt, D.M., Cohen, A.S., & Wollack, J.A. (2001). A mixture model for multiple choice data. Journal of Educational and Behavioral Statistics, 26(4), 381-409.
Bolt, D. M., Cohen, A. S., & Wollack, J. A. (2002). Item parameter estimation under conditions of test speededness: Application of a mixture Rasch model with ordinal constraints. Journal of Educational Measurement, 39(4), 331–348.
Bolt, D. M., & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 29, 395-414.
Bond, T., & Fox, C. (2001). Applying the Rasch model: Fundamental measurement in the human sciences. Mahwah, NJ: Lawrence Erlbaum.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlet. Psychometrika, 64, 153-168.
Brown, W. (1910). Some experimental results in the correlation of mental abilities. British Journal of Psychology, 3, 296-322.
Cao, J., & Stokes, S. L. (2008). Bayesian IRT guessing models for partial guessing
behaviors. Psychometrica, 73, 209-230.
Chang, S.-W., & Ansley, T. N. (2003). A comparative study of item exposure control methods in computerized adaptive testing. Journal of Educational Measurement, 40, 71-103.
Chang, H.-H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229.
Chen, S.-Y. (2004). Controlling item exposure on the Fly in Computerized Adaptive Testing. Paper presented at the Annual Meeting of the Taiwanese Psychological Association, Taipei, Taiwan.
Chen, S.-Y. (2005). Controlling item exposure and test overlap on the Fly in computerized adaptive testing. Paper presented at the IMPS 2005 Annual Meeting of the Psychometric Society. Tilburg, Netherlands.
Chen, S.-Y., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing. Journal of Educational Measurement, 41, 149-174.
Chen, S.-Y., Ankenmann, R. D., & Chang, H.-H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24, 241-255.
Chen, S.-Y., Ankenmann, R. D., & Spray, J. A. (2003). The relationship between item exposure and test overlap in computerized adaptive testing. Journal of Educational Measurement, 40, 129–145.
Chen, S.-Y., & Lei, P.-W. (2005). Controlling item exposure and test overlap in computerized adaptive testing. Applied Psychological Measurement, 29, 204–217.
Chib, S., & Greenberg, E. (1995). Understanding the Metropolis-Hastings algorithm. American Statistician, 49, 327-335.
Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differentail item functioning. Journal of Educational Measurement, 42, 133-148.
Davey, T., & Parshall, C. G. (1995). New algorithms for item selection and exposure control with computerized adaptive testing. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
De Boeck, P., & Wilson, M. (Eds.). (2004). Explanatory item response models: A generalized linear and nonlinear approach. New York: Springer-Verlag.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73(4), 533-559.
de la Torre, J., & Douglas, J. A. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69, 333-353.
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Flaugher, R. (2000). Item pools. In Wainer, H. (Ed), Computerized adaptive testing: A primer (2nd ed.) (pp. 37-59). Mahwah, NH: Lawrence Erlbaum Associates.
Fischer, G. H. (1973). Linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.
Fox, J.-P., & Glas, C. A. W. (2003). Bayesian modeling of measurement error in predictor varables using item response theory. Psychometrika, 68, 169-191.
Gelfand, A. E. (1996). Model comparison using sampling-based methods. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 145-161). Washington, DC: Chapman & Hall.
Gelfand, A. E., & Dey, D. K. (1994). Bayesian model choice: Asymptotics and exact calculations. Journal of the Royal Statistical Society, B, 56, 501-514.
Gelfand, A. E., & Smith, A. F. M. (1990). Sampling-based approaches to calculating marginal densities. J. AM. Statist. Assoc., 85, 398-409.
Geisser, S., & Eddy, W. (1979). A predictive approach to model selection. Journal of American Statistical Association, 74, 153-160.
Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B. (2003). Bayesian data analysis. New York: Chapman & Hall.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE trans. Pattern Analysis and Machine Intelligence, 12, 609-628.
Goegebeur, Y., De Boeck, P., Wollack, J. A., & Cohen, A. S. (2008). A speeded item response model with gradual process change. Psychometrica, 73, 65-87.
Gustafsson, J., & Balke, G. (1993). General and specific abilities as predictors of school achievement. Multivariate Behavioral Research, 28(4), 407-434.
Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society B, 29, 83-100.
Hambleton, R. K., & Swaminathan, H. (1985). Item response theory: Principles and applications. Norwell, MA: Kluwer Academic Publishers.
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.
Hoijtink, H., & Molenaar, I. W. (1997). A multidimensional item response model: Constrained latent class analysis using the Gibbs sampler and posterior predictive checks. Psychometrika, 62, 171-189.
Hoogland, J. J., & Boomsma, A. (1998). Robustness studies in covariance structure modeling: An overview and a meta-analysis. Sociological Methods and Research, 26, 329–367.
Hsu, C.-L. & Chen, S.-Y. (2007). Controlling item exposure and test overlap in variable length computerized adaptive testing. Psychological testing, 54(2), 403-428.
Ip, E. H.-S. (2000). Adjusting for information inflation due to local dependence in moderately large item clusters. Psychometrika, 65, 73-91.
Janssen, R., Tuerlinckx, F., Meulders, M., & De Boeck, P. (2000). A hierarchical IRT model for criterion-referenced measurement. Journal of Educational and Behavioral Statistics, 25(3), 285-306.
Johnson, D. E. (1998). Applied multivariate methods for data analysts. CA: Brooks/Cole Publishing Company.
Johnson, V. E., & Albert, J. H. (1999). Ordinal data modeling. New York: Springer-Verlag.
Jöreskog, K. G., & Sörbom, D. (2001). LISREL Version 8.51[Computer software]. Chicago: Scientific Software International.
Ju, Y. (2005). Item exposure control in a-stratified computerized adaptive testing. Unpublished master’s thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Kang, T., & Cohen, A. S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31, 331-358.
Kelloway, E. K. (1998). Using Lisrel for structural equation modeling: A researcher’s guide. Thousand Oaks: Sage Publications.
Klein Entink, R. H., Fox, J.-P, & van der Linden W. J. (2009). A multivariate multilevel approach to the modeling of accuracy and speed of test takers. Psychometrika, 74(1), 21-48.Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.
Kingsbury, G. G., & Zara, A. R. (1991). A comparison of procedures for content-sensitive item selection in computerized adaptive tests. Applied Measurement in Education, 4, 241-261.
Li, Y., Bolt, D. M., & Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological measurement, 30(1), 3-21.Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
McBride, J. R., & Martin, J. T. (1983). Reliability and validity of adaptive ability tests in a military setting. In D. J. Weiss (Ed.), New Horizons in Testing (pp. 223-226). New York, NY: Academic Press.
McCulloch, C. E., & Searle, S. R. (2001). Generalized, linear, and mixed models. New York: Wiley.
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82-1). Iowa City IA: American College Testing.
Nering, M. L., Davey, T., & Thompson, T. (1998). A hybrid method for controlling item exposure in computerized adaptive testing. Paper presented at the annual meeting of the Psychometric Society, Urbana, IL.
Newton, M. A., & Raftery, A. E. (1994). Approximate Bayesian inference by the weighted likelihood bootstrap (with discussion). Journal of the Royal Statistical Society, Series B, 56, 3-48.
O’Hagan, A. (1991). Discussion on posterior Bayes factors (by M. Aitkin), Journal of the Royal Statistical Society, Series B, 53, 136.
O’Hagan, A. (1995). Fractional Bayes factors for model comparison. , Journal of the Royal Statistical Society, Series B, 57, 99-138.
Owen, R. J. (1969). A Bayesian approach to tailored testing (Research Report 69-92). Princeton, NJ: Educational Testing Service.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.
Parshall, C. G., Spray, J. A., Kalohn, J. C., & Davey, T. (2002). Practical considerations in computer-based testing. New York: Springer-Verlag.
Patz, R., & Junker, B. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24, 146–178.
Press, S. J. (2003). Subjective and objective Bayesian statistics: Principle, models, and applications (Second Edition). Hoboken, NJ: John Wiley & Sons, Inc.
Ponsoda, V., & Olea, J. (2003). Adaptive and tailored testing. (Including IRT and Non IRT Application). In R. Fernandez-Ballesteros (Ed.), Encyclopaedia of psychological assessment (pp. 9–13). London: Sage Publications.
Raftery, A. E. (1996). Hypothesis testing and model selection. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 163-187). London: Chapman & Hall.
Raftery, A. E., & Lewis, S. M. (1996). Implementing MCMC. In W. R. Gilks, S. Richardson, & D. J. Spiegelhalter (Eds.), Markov chain Monte Carlo in practice (pp. 115-130). London: Chapman & Hall.
Raîche, G., Blais, J. G., & Magis, D. (2007). Adaptive estimatiors of trait level in adaptive testing: some proposals. In D. J. Weiss(Ed.), Proceedings of the 2007 GMAC conference on Computerized Adaptive Testing. 2007. June 7-8.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research. (Expanded edition, 1980. Chicago: The University of Chicago Press.)
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25-36.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: Two approaches for exploring measurement invariance. Applied Psychological Measurement, 114, 552-566.
Revuelta, J., & Ponsoda, V. (1998). A comparison of item exposure control methods in
computerized adaptive testing. Journal of Educational Measurement, 35, 311–327.
Rijmen, F., Tuerlinckx, F., De Boeck, P., & Kupens, P. (2003). A nonlinear mixed model framework for item response theory. Psychological Methods, 8, 185-205.
Rubin, D.B. (1984). Bayesinly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12, 1151-1172.
SAS Institute (1999). SAS online doc (version 8) (software manual on CD-Rom). Cary, NC: SAS Institute Inc.
San Martín, E., del Pino, G., & De Boeck, P. (2006). IRT models for ability-based guessing. Applied Psychological Measurement, 30(3), 193-203.
Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22, 53-61.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
Segall, D. O. (2004a). A sharing item response theory model for computerized adaptive testing. Journal of Educational and Behavioral Statistics, 29, 439–460.
Segall, D. O. (2004b). Computerized adaptive testing. In K. Kempf-Leonard (Ed.), The encyclopaedia of social measurement (pp. 429-438). San Diego, CA: Academic Press.
Segall, D. O., & Moreno, K. E. (1999). Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery. In F. Drasgow, & J. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp. 35-65). Mahwah, NJ: Lawrence Erlbaum.
Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and psychological measurement, 68(3), 413-430.
Shih, C.-L. (2007). A Comparison of Item Selection Strategies in Computerized Adaptive Testing for Testlet-based Items and Multidimensional Items. Unpublished doctoral thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Sinharay, S. (2005). Assessing fit unidimensional item response theory models using a Bayesian approach. Journal of Educational Measurement, 42(4), 375-394.
Sinharay, S., & Johnson, M. S. (2003). Simulation studies applying posterior predictive model checking for assessing fit of the common item response theory models. Manuscript in preparation. A preliminary version Retrieved November 1, 2004, from http://www.ets.org/research/newpubs.html.Sinharay, S., Johnson, M. S., & Stern, H. S. (2006). Posterior predictive assessment of item response theory models. Applied Psychological measurement, 30(4), 298-321.
Skrondal, A., & Rabe-Hesketh, S. (2004). Generalized latent variable modeling: Multilevel, longitudinal and structural equation models. Boca Raton, FL: Chapman and Hall/CRC Press.
Smith, L. L., & Reise, S. P. (1998). Gender differences on negative affectivity: An IRT study of differential item functioning on the multidimensional personality questionnaire stress reaction scale. Journal of personality and social psychology, 75(5), 1350-1362.
Smith, E. V. Jr., & Smith, R. M. (Eds.). (2004). Introduction to Rasch measurement theory models and applications. Maple Grove, MN: JAM press.
Spearman, C. (1910). Correlation calculated from faulty data. British Journal of Psychology, 3, 271-295.
Spiegelhalter, D. J., Best, N. G., Carlin, B. P., & van der Linde, A. (2002). Bayesian measures of model complexity and fit. Journal of the Royal Statistical Society, Series B, Methodological, 64, 583-616.
Spiegelhalter, D. J., Thomas, A., & Best, N. (2003). WinBUGS version 1.4 [Computer Program.]. Cambridge, UK: MRC Biostatistics Unit, Institute of Public Health.Stocking, M. L., & Lewis, C. (1995). A new method for controlling item exposure in computerized adaptive testing (ETS Research Report RR-95-25). Princeton, NJ: Educational Testing Service.
Stocking, M. L., & Lewis, C. (1998). Controlling item exposure conditional on ability in computerized adaptive testing. Journal of Educational and Behavioral Statistics, 23, 57-75.
Stocking, M. L., & Swanson, L. (1993). A method for severely constrained item selection in adaptive testing. Applied Psychological Measurement, 17 (3), 277-292.
Su, Y.-H. (2007). Simultaneous Control over Item Exposure and Test Overlap in Computerized Adaptive Testing for Testlet-based Items and Multidimensional Items. Unpublished doctoral thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Swanson, L., & Stocking, M. L. (1993). A model and heuristic for solving very large item selection problems. Applied Psychological Measurement, 17 (2), 151-166.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973–977). San Diego, CA: Navy Personnel Research and Development Center.
Tanner, M. A., & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation. J. AM. Statist. Assoc., 82, 528-549.
Thomasson, G. L. (1995). New item response control algorisms for computerized adaptive testing. Paper presented at the annual meeting of the Psychometric Society, Minneapolis, MN.
van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 63, 201-216.
van der Linden, W. J., & Glas, C. (Eds.). (2000). Computer adaptive testing: Theory and practice. Boston, MA: Kluwer Academic Publishers.
van der Linden,W. J., & Hambleton, R. K. (Eds.). (1997). Handbook of modern item response theory. New York: Springer.
van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 1-25). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics 22, 203-226.
Wainer, H. (Ed.) (1990). Computerized adaptive testing: A primer. Hilsdale, NJ: Lawrence Erlbaum Associates.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-186.
Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 245-269). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57, 741-758.
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22-29.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.
Wang, W.-C. (2004). Effects of anchor item methods on the detection of differential
item functioning within the family of Rasch models. Journal of Experimental Education, 72, 221-261.
Wang W.-C, & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological measurement, 28(5), 295-316.
Wang, W.-C., Cheng, Y.-Y., & Wilson, M. R. (2005). Local item dependency for items across tests connected by common stimuli. Educational and Psychological measurement, 65, 5-27.
Wang, W.-C., & Liu, C.-Y. (2007). Formulation and Application of the Generalized Multilevel Facets Model. Educational and Psychological Measurement 67, 583-605.
Wang, W.-C., & Wilson, M. R. (2005a). Assessment of differential item functioning
in testlet-based items using the Rasch testlet model. Educational and Psychological Measurement, 65, 549-576.
Wang, W.-C., & Wilson, M. R. (2005b). The Rasch testlet model. Applied Psychological Measurement, 29, 126-149.
Wang, W.-C., & Wilson, M. R. (2005c). Exploring local item dependence using a random-effects facet model. Applied Psychological Measurement, 29, 296-318.
Wang, W.-C., Wilson, M. R., & Adams, R. J. (1997). Rasch models for multidimensionality between items and within items. In M. Wilson, G. Engelhard & K. Draney (Eds.), Objective measurement: Theory into practice (Volume 4, pp. 139-155). Norwood, NJ: Ablex.
Wang, W.-C., Wilson, M. R., & Adams, R. J. (2000). Interpreting the parameters of a multidimensional Rasch model. In M. Wilson, & G. Engelhard (Eds.), Objective measurement: Theory into practice (Volume 5, pp. 219-242). Norwood, NJ: Ablex.
Way, W. D. (1998). Protecting the integrity of computerized testing item pools. Educational Measurement: Issues and Practice, 17, 17–27.
Wright, B. D., & Masters, G. N. (1982). Rating scale analysis. Chicago: MESA
Press.
Wu, M.-L. (2006). Controlling Item Exposure on the Fly in Computerized Adaptive Testing. Unpublished master’s thesis, National Chung Cheng University, Chia-Yi, Taiwan.
Wu, M.-L., & Chen, S.-Y. (2008). Investigating item exposure control on the FLY in Computerized Adaptive Testing. Psychological testing, 55(1), 1-32.Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest: Generalised item response modeling software. Melbourne, Australia: Australian Council for Educational Research.