Ackerman, T. A. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20, 311-329.
Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
Andrich, D. (1978) A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157.
Birubaum, A. (1968). Some latent trait models and their use in inferring an examinees’ ability. In F. M. Lord & M. R. Novick (Ed.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.Bloxom, B. M., & Vale, C. D. (1987, June). Multidimensional adaptive testing: A procedure for sequential estimation of the posterior centroid and dispersion of theta. Paper presented at the meeting of the Psychometric Society, Montreal.
Bock, R.D. (1972) Estimating items parameters and latent ability when responses are scores in two or more nominal categories. Psychometrika, 37, 29-51.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bolt, D. M. & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 29, 395-414.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.
Breithaupt, K., Ariel, A., & Veldkamp, B. P. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5, 319–330.
Breithaupt, K. & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67, 5-20.
Burden, R. L. & Faires, J. D. (1993). Numerical Analysis (5th Edition). Boston. PWS Publishing Company.
Camilli, G., Wang, M. M., & Fesq, J. (1995). The effects of dimensionality on equating the Law School Admission Test. Journal of Educational Measurement, 32, 79-96.
Carlson、R.、& Suen、H.K. (1996). A comparison of item selection strategies used in computer adaptive test of math ability. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice, Vol. 3, (pp.335-347). Norwood、NJ: Ablex.Chan, W.-H., Leu, Y.-C., & Chen, C.-M. (2007). Exploring group-wise conceptual deficiencies of fractions for fifth and sixth graders in Taiwan. The Journal of Experimental Education, 76, 26–57.
Chang, H. -H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229.
Chang, H. -H., & Ying, Z. (1999). a-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222.
Chen, S., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing. Journal of Educational Measurement, 41 149-174.
Chen, S. -Y., Ankenmann, R. D., & Chang, H. -H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24, 241-255.
Cheng, P. E., & Liou, M. (2003). Computerized adaptive testing using the nearest-neighbors criterion. Applied Psychological Measurement, 27, 204-216.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32.
Clarkson, D. B. & Gonzalez, R. (2001). Random effects diagonal metric multidimensional scaling models. Psychometrika, 66, 25-43.
Collis, K. F. (1983). Development of a group test of mathematical understanding using superitems SOLO technique. Journal of Science and Mathematics Education in South East Asia, 6, 5-14.Collis, K. F., & Davey, H. A. (1986). A technique for evaluating skills in high school science. Journal of Research in Science Teaching, 23,651-663.
Cureton, E. E. (1965). Reliability and validity: Basic assumptions and experimental designs. Educational and Psychological Measurement, 25, 326-346.
Davis, L. L., & Dodd, B. G. (2003). Item exposure constraints for testlets in the Verbal Reasoning Section of the MCAT. Applied Psychological Measurement, 27, 335-356.
DeMars, C. E. (2007). ``Guessing'' parameter estimates for multidimensional item response theory models. Educational and Psychological Measurement, 67, 433-446.
Dodd, B. G. (1985). Attitude scaling: A comparison of the graded response and partial credit latent trait models. Doctoral dissertation, University of Texas at Austin.
Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129-143.
Dodd, B. G., De Ayala, R. J., & Koch W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5-22.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, p249-261.Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Fan, M., & Hsu, Y. (1996, April). Utility of Fisher information, global information and different starting abilities in mini CAT. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New York.
Ferdous, A. A. & Plake, B. S. (2007). Item selection strategy for reducing the number of items rated in an Angoff standard setting study. Educational and Psychological Measurement, 67, 193-206.
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.Folk, V. G., & Green, B. F. (1989). Adaptive estimation when the unidimensionality assumption of IRT is violated. Applied Psychological Measurement, 13, 373-389.
Fraser, C. (1988). NOHARM II: A FORTRAN program for fitting unidimensional and multidimensional normal ogive models of latent trait theory. Center for Behavioral Studies, the University of New England, Armidale, NSW, Australia.
Glöckner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10, 544-565.
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.
Hau, K. T. & Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first ? Journal of Educational Measurement, 38, 249-266.
Horst, P. (1965). Factor analysis of data matrices. New York: Holt, Rinehart and Winston.
Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, 155-168.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.
Koch, W. R. (1983). Likert scaling using the graded response latent trait model. Applied Psychological Measurement, 7, 15-32.
Koch, W. R., & Dodd, B. G. (1989). An investigation of procedures for computerized adaptive testing using partial credit scoring. Applied Measurement in Education, 2, 335-357.
Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Leung, C.-K., Chang, H.-H., & Hau, K.-T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter Algorithm. Applied Psychological Measurement, 26, 376-392.
Li, Y. H., & Schafer, W. D. (2004). The context effects of multidimensional CAT on the accuracy of multidimensional abilities and item exposure rates. American Educational Research Association Convention, San Diego.
Li, Y. H., & Schafer, W. D. (2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied Psychological Measurement, 29, 3-25.
Linacre, J. M. (1989). Many-faceted Rasch measurement. Chicago: MESA press.
Lord, F. M. (1977). A broad-range tailored test of verbal ability. Applied Psychological Measurement, 1, 95-100.Lord, F. M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404.
Leung, C.-K., Chang, H.-H., & Hau, K.-T. (2005). Computerized adaptive testing_A mixture item selection approach for constrained situations. British Journal of Mathematical and Statistical Psychology, 58, 239-257.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Maydeu-Olivares, A. (2001). Multidimensional item response theory modeling of binary data: large sample properties of NOHARM estimates. Journal of Educational and Behavioral Statistics, 26, 51-71.
Maydeu-Olivares, A., Hernandez, A., & McDonald, R. P. (2006). A Multidimensional ideal point item response theory model for binary data. Multivariate Behavioral Research, 41, 445–471.
McDonald, R. P. (1967). Nonlinear factor analysis. Psychometric monograph, 15, 1-167.
McDonald, R. P. (2000). A Basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
McKeeman, W. M. (1962). Algorithm 145: Adaptive numerical integration by Simpson''s rule. Commun. ACM 5 (12): 604
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82-1). Iowa City IA: American College Testing.
McKinley, R. L., & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389–390.
Mills, C. N. (1999). Development and introduction of a computer adaptive graduate record examinations general test. In F. Drasgow, & J. Olson-Buchanan (Eds.), Innovations in computerized assessment, (pp.117-135). Mahwah, NJ: Lawrence Erlbaum.Mills, C. N., & Steffen, M. (2000). The GRE computer adaptive test: Operational issues. In W. J. van der Linden, & C. A. W. Glas (Ed.), Computerized adaptive testing: Theory and practice, (pp.75-99). Dordrecht, The Netherlands: Kluwer Academic PublishersMulaik, S. A. (1972, March). A mathematical investigation of some multidimensional Rasch models for psychological tests. Paper presented at the annual meeting of the Psychometric Society, Princeton NJ.
Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551-560.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.
Passos, V. L., Berger, M. P. F., & Tan, F. E. (2007). Test design optimization in CAT early stage with the nominal response model. Applied Psychological Measurement, 31, 213-232.
Penfield, R. D. (2006). Applying bayesian item selection approaches to adaptive tests using polytomous items. Applied Measurement in Education, 19, p1-20.
Petersen, M. A., Groenvold, M., Aaronson, N., Fayers, P., Sprangers, M., & Bjorner, J.B. (2006). Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluations. Quality of Life Research, 15, 315-329.
Pomplun, M. & Ritchie, T. (2004). An investigation of context effects for item randomization within testlets. Journal of Educational Computing Research , 30, 243-254.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research. (Expanded edition, 1980. Chicago: The University of Chicago Press.)
Rasch, G. (1962). On general laws and the meaning of measurement in psychology. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 4, 321-334.
Reckase, M. D. (1972). Development and application of a multivariate logistic latent trait model. Unpublished doctoral dissertation, Syracuse University, Syracuse NY.
Reckase, M. D. (1973). An interactive computer program for tailored testing based on the one-parameter logistic model. Paper presented to the National Conference on the Use of On-Line computers in Psychology, St. Louis. MO.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25-36.
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.Reckase, M. D., Ackerman, T. A., & Carlson, J. E.(1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25, 193–203.Reckase, M. D. & McKinley, R. L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361–373.
Rijmen, F., & Briggs, D. (2004). Multiple person dimensions and latent item predictors. In P. D. Boeck, & M. Wilson (Eds.), Explanatory Item Response Models: A generalized linear and nonlinear approach (pp.247-265). New York: Springer-Verlag.
Rijmen, F. & De Boeck, P. (2005). A relation between a between-item multidimensional IRT model and the mixture-Rasch model. Psychometrika, 70, 481-496.
Romberg, T. A., Collis, K. F., Donovan, B. F., Buchanan, A. E., & Romberg, M. N. (1982). The development of mathematical problem solving superitems (Report of NIE/EC Item Development Project). Madison, WI: Wisconsin Center for Education Research.
Romberg, T. A., Jurdak, M. E., Collis, K. F., & Buchanan, A. E. (1982). Construct validity of a set of mathematical superitems (Report of NIE/ECS Item Development Project). Madison, WI: Wisconsin Center for Education Research.
Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53, 349-359.
Rost, J. & Carstensen, C. H. (2002). Multidimensional Rasch measurement via item component models and faceted dsigns. Applied Psychological Measurement, 26, 42-56.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39, 111-121.
Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB and CAT. In W. A. Sands, B. K. Waters, & J. R. McBride (Eds.) Computerized adaptive testing : From inquiry to operation (pp.3-9) Washington, DC: American Psychological Association.
Sands, W. A., Waters, B. K., & McBride, J. R. (1997). Computerized adaptive testing : From inquiry to operation. Washington, DC: American Psychological Association.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
Segall, D. O. (2000). Principles of multidimensional adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 53-73). Boston: Kluwer Academic.
Segall, D. O. (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66, 79-97.
Segall, D. O., & Moreno, K. E. (1999). Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery. In F. Drasgow, & J. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp.35-65). Mahwah, NJ: Lawrence Erlbaum.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.
Sympson, J. B. (1978). A model for testing with multidimensional items. In Weiss, D. J. (Ed) Proceedings of the computerized adaptive testing conference, Department Of Psychology University Of Minnesota, Minneapolis.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973–977). San Diego CA: Navy Personnel Research and Development Center.
Tam, S. S. (1992). Acomparison of methods for adaptive estimation of a multidimensional trait. Unpublished doctoral dissertation, Columbia University, New York City, NY.
te Marvelde, J. M., Glas, C. A. W., Van Landeghem, G., & Van Damme, J. (2006).Application of multidimensional item response theory models to longitudinal data. Educational and Psychological Measurement, 66, 5-34.
Urry, V. W. A. (1970). Monte Carlo investigation of logistic test models. West Lafayette, IN: Unpublished doctoral dissertation, Purdue University.van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 63, 201-216.
van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398-412.
van der Linden, W. J. (2005). A cmparison of item-selection methods for adaptive tests with content constraints. Journal of Educational Measurement, 42, 283-302.
van der Linden, W. J., & Glas, C. (Eds.). (2000). Computer adaptive testing: Theory and practice. Boston, MA: Kluwer Academic Publishers.
van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp.1-25). Dordrecht, The Netherlands: Kluwer Academic Publishers.
van Rijn, P.W., Eggen,T. J. H. M., Hemker, B.T. & Sanders, P.F. (2002).Evaluation of selection procedures for computerized adaptive testing with polytomous items. Applied Psychological Measurement, 26, 393-411.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics 22, 203-226
Veldkamp, B. P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575-588.Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-186.Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.
Wainer, H., & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1-14.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57, 741-758.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22-29.Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Ed.), Computerized adaptive testing: Theory and practice, (pp.245-269). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Wainer, H., Dorans, N. J., Flaugher, R., Mislevy, R. J., Thissen, D., Eignor, D. Green, B. F., & Steinberg, L. (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ:. Lawrence Erlbaum.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlet to score TOEFL. Journal of Educational Measurement, 37, 203-220.
Walker, C. M. & Beretvas, S. N. (2003). Comparing multidimensional and unidimensional proficiency classifications: Multidimensional IRT as a diagnostic aid. Journal of Educational Measurement, 40, 255-275.
Wang, W. C. (2004). Direct estimation of correlation as a measure of association strength using multidimensional item response models. Educational and Psychological Measurement, 64, 937-955.
Wang, W.-C., & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316.
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136.Wang, W.-C., & Wilson, M. R. (2005). The Rasch testlet model. Applied Psychological Measurement, 29, 126-149
Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlet: theory and applications. GRE Board Professional Report No. 98-01P. Princeton, NJ: Educational Testing Service.
Weiss, D. J. (1973). The stratified adaptive computerized ability test (Research Report 73-3). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory.Weiss, D. J. (1974). Strategies of adaptive ability measurement (Research Report 74-5). University of Minnesota, Department of Psychology, Psychometric Methods Program.
Weissman, A. (2006). A feedback control strategy for enhancing item selection efficiency in computerized adaptive testing. Applied Psychological Measurement, 30, 84-99.
Weissman, A. (2007). Mutual information item selection in adaptive classification testing. Educational and Psychological Measurement, 67, 41-58.
Whitely, S. E. (1980). Measuring aptitude processes with multicomponent latent trait models. (Technical Report NIE-80-5). Lawrence: University of Kansas.
Wilson, D., Wood, R., & Gibbons, R. (1984). TESTFACT. Test scoring, item statistics and item factor analysis [Computer software and manual]. Mooreville, IN: Scientific Software.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest: Generalised item response modeling software. Melbourne, Australia: Australian Council for Educational Research.
Yao, L. & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83–105.
Yao, L. & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An Application to Mixed-Format Tests. Applied Psychological Measurement, 30, 469-492.Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.