:::

詳目顯示

回上一頁
題名:題組與多向度電腦適性測驗之選題策略的比較
作者:施慶麟 引用關係
作者(外文):Ching-Lin Shih
校院名稱:國立中正大學
系所名稱:心理學所
指導教授:王文中
學位類別:博士
出版日期:2008
主題關鍵詞:電腦適性測驗題組反應理論多向度試題反應理論題組選題策略item selection strategycomputerized adaptive testingtestlet response theorymultidimensional item response theorytestlet
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(1) 博士論文(2) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:1
  • 共同引用共同引用:0
  • 點閱點閱:62
適性測驗是針對不同的受試者提供符合其能力的試題,避免受試者浪費時間在作答過難或過易的試題,因此每位受試者皆有其個人化的試題組合,藉以精準測量其能力。由於適性測驗的目標是每個施測的試題都盡可能的接近受試者能力,因此在選擇供受試者施測的試題時較為耗費時間。基於提昇適性測驗效率的目的,電腦科技被引進至測驗領域並應用於協助提高測驗效率,所以電腦適性測驗已經成為近20年來諸多學者探究的議題。然而以往探究的主題多數築基於單向度試題反應理論的基礎上,進行相關議題的研究,但隨著不同類型試題的採用以及適用於不同測驗情境之試題反應理論模式的發展,多向度電腦適性測驗(MCAT)以及題組式電腦適性測驗(TCAT)將可能是未來的發展方向。故此,本研究在MCAT及TCAT情境下,針對Fisher information(FI)、Fisher interval information(FII)、Fisher information with a posterior distribution(FIP)、Kullback-Leibler information(KL)以及Kullback-Leibler information with a posterior distribution(KLP)等5種選題策略,在不考量真實情境中如曝光率控制、內容平衡等限制的情形下,進行模擬實驗的比較。
在MCAT部分,本研究操弄(1)MIRT模式;(2)題庫規模;(3)向度間關連;(4) 選題策略等4個獨變項,探究5種選題策略在不同實驗設計下於(1)受試者能力參數估計值與真值間的偏誤(bias);(2)能力估計的RMSE(root mean square error);(3)題庫使用率(bank usage);以及(4)測驗重複率(average between tests overlap rate)等依變項上的表現。結果發現在M1PL的模式下,FI、KL與KLP等三種策略在能力估計的精準度上並無明顯差異,但由於FI的題庫使用比率高於KL及KLP,因此建議在M1PL時應採用FI作為MCAT的選題策略;在M3PL的模式下,若向度間相關較低,建議採用FII或FIP作為選題策略,可兼顧能力估計以及題庫使用情形;若向度間的相關較高時,則建議採用FI作為MCAT系統的選題策略。
在TCAT的情境中,本研究主張題組內的試題難度分配情形對於CAT系統的效能應該有所影響,因此將題組結構納入探討的獨變項之一,此外還包含(1)TRT模式;(2)題庫規模;(3)題組內的試題個數及題組測驗長度;(4)題組效應;(5)選題策略等5種獨變項。依變項部分則與MCAT部分相同。研究結果發現,在1PTL下,本研究建議在測驗前期可使用KL或KLP選題策略,後期則使用FI選題,可得到略為準確的能力估計;在3PTL情境下,考量5種策略在能力估計的RMSE以及題庫使用率上的表現,本研究建議不論題組內的難度分配為何,採用FII或FIP為選題策略是較理想的選擇。
Adaptive testing can increase measurement efficiency by providing items that tailored to each individual examinee and prevent the examinee from wasting time to answer the items that are either too easy or too difficult. Since the tests are tailored for each examinee, a precise estimate of his or her ability can be expected. Due to the goal of adaptive testing is to select the item that approximates respondent’s ability as close as possible, it takes times to find out the most appropriate item for the examinee. To improve the efficiency of adaptive testing, computer technology was leaded in the adaptive testing. In the past two decades, issues about computerized adaptive testing have become more and more popular. However, most of the studies were mainly based on unidimensional item response theory. As different types of items (e.g., multidimensional items and testlet-based items) were used in the real test situations and the corresponding models were further established, multidimensional computerized adaptive testing (MCAT) or testlet-based computerized adaptive testing (TCAT) could be constructed and should be studied. This study investigates the performance of different item selection strategies under MCAT and TCAT. Five strategies (Fisher information; FI, Fisher interval information; FII, Fisher information with a posterior distribution; FIP, Kullback-Leibler information; KL, Kullback-Leibler information with a posterior distribution; KLP) were compared here using simulated data without taking exposure control and content balancing into consideration.
Four independent variables were manipulated under the MCAT part: (a) the MIRT models (M1PL and M3PL); (b) the size of item bank; (c) the correlation coefficient between dimensions; and (d) the item selection strategies. Dependent variables were: (a) the bias of the ability estimates; (b) the root mean square error (RMSE) of the ability estimates; (c) the rate of item bank usage; and (d) the average between tests overlap rate. The results showed that the differences in RMSEs between FI, KL, and KLP could be neglected under the M1PL model, but FI was recommended for higher rate of bank usage. Under the M3PL model, FII and FIP should be adopted when the correlation between dimensions is low, whereas the FI should be adopted when strong correlation is found between dimensions.
Under the framework of TCAT, the performance seems to be affected by the distribution of the item difficulty within each testlet. Therefore the structure of item difficulty within testlet was taken as an independent variable in this study. In addition, five more independent variables were manipulated in this study: (a) the TRT model (1PTL and 3PTL); (b) the size of item bank; (c) the number of items within each testlet; (d) the effect of testlet; and (e) the item selection strategies. Dependent variables used here were the same as those used under the MCAT part. It was found that under the 1PTL, KL and KLP should be used in the early stage of TCAT, whereas the FI should be adopted in the following testing stage for a more precise estimation of respondent’s ability. Taking both the RMSE of ability estimates and the rate bank usage into consideration, FII and FIP were suggested under the 3PTL model, no matter which distribution of item difficulty within testlet is.
Ackerman, T. A. (1996). Graphical representation of multidimensional item response theory analyses. Applied Psychological Measurement, 20, 311-329.
Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
Andrich, D. (1978) A rating formulation for ordered response categories. Psychometrika, 43, 561-573.
Bartolucci, F. (2007). A class of multidimensional IRT models for testing unidimensionality and clustering items. Psychometrika, 72, 141–157.
Birubaum, A. (1968). Some latent trait models and their use in inferring an examinees’ ability. In F. M. Lord & M. R. Novick (Ed.), Statistical theories of mental test scores. Reading, MA: Addison-Wesley.new window
Bloxom, B. M., & Vale, C. D. (1987, June). Multidimensional adaptive testing: A procedure for sequential estimation of the posterior centroid and dispersion of theta. Paper presented at the meeting of the Psychometric Society, Montreal.
Bock, R.D. (1972) Estimating items parameters and latent ability when responses are scores in two or more nominal categories. Psychometrika, 37, 29-51.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bolt, D. M. & Lall, V. F. (2003). Estimation of compensatory and noncompensatory multidimensional item response models using Markov chain Monte Carlo. Applied Psychological Measurement, 29, 395-414.
Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64, 153-168.

Breithaupt, K., Ariel, A., & Veldkamp, B. P. (2005). Automated simultaneous assembly for multistage testing. International Journal of Testing, 5, 319–330.
Breithaupt, K. & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67, 5-20.
Burden, R. L. & Faires, J. D. (1993). Numerical Analysis (5th Edition). Boston. PWS Publishing Company.
Camilli, G., Wang, M. M., & Fesq, J. (1995). The effects of dimensionality on equating the Law School Admission Test. Journal of Educational Measurement, 32, 79-96.
Carlson、R.、& Suen、H.K. (1996). A comparison of item selection strategies used in computer adaptive test of math ability. In G. Engelhard & M. Wilson (Eds.), Objective measurement: Theory into practice, Vol. 3, (pp.335-347). Norwood、NJ: Ablex.new window
Chan, W.-H., Leu, Y.-C., & Chen, C.-M. (2007). Exploring group-wise conceptual deficiencies of fractions for fifth and sixth graders in Taiwan. The Journal of Experimental Education, 76, 26–57.
Chang, H. -H., & Ying, Z. (1996). A global information approach to computerized adaptive testing. Applied Psychological Measurement, 20, 213-229.
Chang, H. -H., & Ying, Z. (1999). a-Stratified multistage computerized adaptive testing. Applied Psychological Measurement, 23, 211-222.
Chen, S., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing. Journal of Educational Measurement, 41 149-174.
Chen, S. -Y., Ankenmann, R. D., & Chang, H. -H. (2000). A comparison of item selection rules at the early stages of computerized adaptive testing. Applied Psychological Measurement, 24, 241-255.

Cheng, P. E., & Liou, M. (2003). Computerized adaptive testing using the nearest-neighbors criterion. Applied Psychological Measurement, 27, 204-216.
Christoffersson, A. (1975). Factor analysis of dichotomized variables. Psychometrika, 40, 5-32.
Clarkson, D. B. & Gonzalez, R. (2001). Random effects diagonal metric multidimensional scaling models. Psychometrika, 66, 25-43.
Collis, K. F. (1983). Development of a group test of mathematical understanding using superitems SOLO technique. Journal of Science and Mathematics Education in South East Asia, 6, 5-14.new window
Collis, K. F., & Davey, H. A. (1986). A technique for evaluating skills in high school science. Journal of Research in Science Teaching, 23,651-663.
Cureton, E. E. (1965). Reliability and validity: Basic assumptions and experimental designs. Educational and Psychological Measurement, 25, 326-346.
Davis, L. L., & Dodd, B. G. (2003). Item exposure constraints for testlets in the Verbal Reasoning Section of the MCAT. Applied Psychological Measurement, 27, 335-356.
DeMars, C. E. (2007). ``Guessing'' parameter estimates for multidimensional item response theory models. Educational and Psychological Measurement, 67, 433-446.
Dodd, B. G. (1985). Attitude scaling: A comparison of the graded response and partial credit latent trait models. Doctoral dissertation, University of Texas at Austin.
Dodd, B. G., Koch, W. R., & De Ayala, R. J. (1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129-143.
Dodd, B. G., De Ayala, R. J., & Koch W. R. (1995). Computerized adaptive testing with polytomous items. Applied Psychological Measurement, 19, 5-22.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23, p249-261.new window
Embretson, S. E., & Reise, S. P. (2000). Item response theory for psychologists. Mahwah, NJ: Lawrence Erlbaum.
Fan, M., & Hsu, Y. (1996, April). Utility of Fisher information, global information and different starting abilities in mini CAT. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New York.
Ferdous, A. A. & Plake, B. S. (2007). Item selection strategy for reducing the number of items rated in an Angoff standard setting study. Educational and Psychological Measurement, 67, 193-206.
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374.new window
Folk, V. G., & Green, B. F. (1989). Adaptive estimation when the unidimensionality assumption of IRT is violated. Applied Psychological Measurement, 13, 373-389.
Fraser, C. (1988). NOHARM II: A FORTRAN program for fitting unidimensional and multidimensional normal ogive models of latent trait theory. Center for Behavioral Studies, the University of New England, Armidale, NSW, Australia.
Glöckner-Rist, A., & Hoitjink, H. (2003). The best of both worlds: factor analysis of dichotomous data using item response theory and structural equation modeling. Structural Equation Modeling, 10, 544-565.
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.
Hau, K. T. & Chang, H. H. (2001). Item selection in computerized adaptive testing: Should more discriminating items be used first ? Journal of Educational Measurement, 38, 249-266.
Horst, P. (1965). Factor analysis of data matrices. New York: Holt, Rinehart and Winston.

Kelderman, H. (1996). Multidimensional Rasch models for partial-credit scoring. Applied Psychological Measurement, 20, 155-168.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.
Koch, W. R. (1983). Likert scaling using the graded response latent trait model. Applied Psychological Measurement, 7, 15-32.
Koch, W. R., & Dodd, B. G. (1989). An investigation of procedures for computerized adaptive testing using partial credit scoring. Applied Measurement in Education, 2, 335-357.
Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19, 189–202.
Leung, C.-K., Chang, H.-H., & Hau, K.-T. (2002). Item selection in computerized adaptive testing: Improving the a-stratified design with the Sympson-Hetter Algorithm. Applied Psychological Measurement, 26, 376-392.
Li, Y. H., & Schafer, W. D. (2004). The context effects of multidimensional CAT on the accuracy of multidimensional abilities and item exposure rates. American Educational Research Association Convention, San Diego.
Li, Y. H., & Schafer, W. D. (2005). Trait parameter recovery using multidimensional computerized adaptive testing in reading and mathematics. Applied Psychological Measurement, 29, 3-25.
Linacre, J. M. (1989). Many-faceted Rasch measurement. Chicago: MESA press.
Lord, F. M. (1977). A broad-range tailored test of verbal ability. Applied Psychological Measurement, 1, 95-100.new window
Lord, F. M. (1980). Application of item response theory to practical testing problems. Hillsdale, NJ: Lawrence Erlbaum.

Luecht, R. M. (1996). Multidimensional computerized adaptive testing in a certification or licensure context. Applied Psychological Measurement, 20, 389-404.
Leung, C.-K., Chang, H.-H., & Hau, K.-T. (2005). Computerized adaptive testing_A mixture item selection approach for constrained situations. British Journal of Mathematical and Statistical Psychology, 58, 239-257.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47, 149-174.
Maydeu-Olivares, A. (2001). Multidimensional item response theory modeling of binary data: large sample properties of NOHARM estimates. Journal of Educational and Behavioral Statistics, 26, 51-71.
Maydeu-Olivares, A., Hernandez, A., & McDonald, R. P. (2006). A Multidimensional ideal point item response theory model for binary data. Multivariate Behavioral Research, 41, 445–471.
McDonald, R. P. (1967). Nonlinear factor analysis. Psychometric monograph, 15, 1-167.
McDonald, R. P. (2000). A Basis for multidimensional item response theory. Applied Psychological Measurement, 24, 99-114.
McKeeman, W. M. (1962). Algorithm 145: Adaptive numerical integration by Simpson''s rule. Commun. ACM 5 (12): 604
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82-1). Iowa City IA: American College Testing.
McKinley, R. L., & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389–390.
Mills, C. N. (1999). Development and introduction of a computer adaptive graduate record examinations general test. In F. Drasgow, & J. Olson-Buchanan (Eds.), Innovations in computerized assessment, (pp.117-135). Mahwah, NJ: Lawrence Erlbaum.new window
Mills, C. N., & Steffen, M. (2000). The GRE computer adaptive test: Operational issues. In W. J. van der Linden, & C. A. W. Glas (Ed.), Computerized adaptive testing: Theory and practice, (pp.75-99). Dordrecht, The Netherlands: Kluwer Academic Publishersnew window
Mulaik, S. A. (1972, March). A mathematical investigation of some multidimensional Rasch models for psychological tests. Paper presented at the annual meeting of the Psychometric Society, Princeton NJ.
Muthén, B. (1978). Contributions to factor analysis of dichotomous variables. Psychometrika, 43, 551-560.
Owen, R. J. (1975). A Bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70, 351–356.
Passos, V. L., Berger, M. P. F., & Tan, F. E. (2007). Test design optimization in CAT early stage with the nominal response model. Applied Psychological Measurement, 31, 213-232.
Penfield, R. D. (2006). Applying bayesian item selection approaches to adaptive tests using polytomous items. Applied Measurement in Education, 19, p1-20.
Petersen, M. A., Groenvold, M., Aaronson, N., Fayers, P., Sprangers, M., & Bjorner, J.B. (2006). Multidimensional computerized adaptive testing of the EORTC QLQ-C30: Basic developments and evaluations. Quality of Life Research, 15, 315-329.
Pomplun, M. & Ritchie, T. (2004). An investigation of context effects for item randomization within testlets. Journal of Educational Computing Research , 30, 243-254.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Institute of Educational Research. (Expanded edition, 1980. Chicago: The University of Chicago Press.)

Rasch, G. (1962). On general laws and the meaning of measurement in psychology. Proceedings of the Fourth Berkeley Symposium on Mathematical Statistics and Probability, 4, 321-334.
Reckase, M. D. (1972). Development and application of a multivariate logistic latent trait model. Unpublished doctoral dissertation, Syracuse University, Syracuse NY.
Reckase, M. D. (1973). An interactive computer program for tailored testing based on the one-parameter logistic model. Paper presented to the National Conference on the Use of On-Line computers in Psychology, St. Louis. MO.
Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25-36.
Reckase, M. D. (1985). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9, 401–412.new window
Reckase, M. D., Ackerman, T. A., & Carlson, J. E.(1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25, 193–203.new window
Reckase, M. D. & McKinley, R. L. (1991). The discriminating power of items that measure more than one dimension. Applied Psychological Measurement, 15, 361–373.
Rijmen, F., & Briggs, D. (2004). Multiple person dimensions and latent item predictors. In P. D. Boeck, & M. Wilson (Eds.), Explanatory Item Response Models: A generalized linear and nonlinear approach (pp.247-265). New York: Springer-Verlag.
Rijmen, F. & De Boeck, P. (2005). A relation between a between-item multidimensional IRT model and the mixture-Rasch model. Psychometrika, 70, 481-496.
Romberg, T. A., Collis, K. F., Donovan, B. F., Buchanan, A. E., & Romberg, M. N. (1982). The development of mathematical problem solving superitems (Report of NIE/EC Item Development Project). Madison, WI: Wisconsin Center for Education Research.

Romberg, T. A., Jurdak, M. E., Collis, K. F., & Buchanan, A. E. (1982). Construct validity of a set of mathematical superitems (Report of NIE/ECS Item Development Project). Madison, WI: Wisconsin Center for Education Research.
Rosenbaum, P. R. (1988). Item bundles. Psychometrika, 53, 349-359.
Rost, J. & Carstensen, C. H. (2002). Multidimensional Rasch measurement via item component models and faceted dsigns. Applied Psychological Measurement, 26, 42-56.
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Samejima, F. (1974). Normal ogive model on the continuous response level in the multidimensional latent space. Psychometrika, 39, 111-121.
Sands, W. A., & Waters, B. K. (1997). Introduction to ASVAB and CAT. In W. A. Sands, B. K. Waters, & J. R. McBride (Eds.) Computerized adaptive testing : From inquiry to operation (pp.3-9) Washington, DC: American Psychological Association.
Sands, W. A., Waters, B. K., & McBride, J. R. (1997). Computerized adaptive testing : From inquiry to operation. Washington, DC: American Psychological Association.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
Segall, D. O. (2000). Principles of multidimensional adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp. 53-73). Boston: Kluwer Academic.
Segall, D. O. (2001). General ability measurement: An application of multidimensional item response theory. Psychometrika, 66, 79-97.
Segall, D. O., & Moreno, K. E. (1999). Development of the computerized adaptive testing version of the Armed Services Vocational Aptitude Battery. In F. Drasgow, & J. Olson-Buchanan (Eds.), Innovations in computerized assessment (pp.35-65). Mahwah, NJ: Lawrence Erlbaum.
Sireci, S. G., Thissen, D., & Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.
Sympson, J. B. (1978). A model for testing with multidimensional items. In Weiss, D. J. (Ed) Proceedings of the computerized adaptive testing conference, Department Of Psychology University Of Minnesota, Minneapolis.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973–977). San Diego CA: Navy Personnel Research and Development Center.
Tam, S. S. (1992). Acomparison of methods for adaptive estimation of a multidimensional trait. Unpublished doctoral dissertation, Columbia University, New York City, NY.
te Marvelde, J. M., Glas, C. A. W., Van Landeghem, G., & Van Damme, J. (2006).Application of multidimensional item response theory models to longitudinal data. Educational and Psychological Measurement, 66, 5-34.
Urry, V. W. A. (1970). Monte Carlo investigation of logistic test models. West Lafayette, IN: Unpublished doctoral dissertation, Purdue University.new window
van der Linden, W. J. (1998). Bayesian item-selection criteria for adaptive testing. Psychometrika, 63, 201-216.
van der Linden, W. J. (1999). Multidimensional adaptive testing with a minimum error-variance criterion. Journal of Educational and Behavioral Statistics, 24, 398-412.
van der Linden, W. J. (2005). A cmparison of item-selection methods for adaptive tests with content constraints. Journal of Educational Measurement, 42, 283-302.
van der Linden, W. J., & Glas, C. (Eds.). (2000). Computer adaptive testing: Theory and practice. Boston, MA: Kluwer Academic Publishers.

van der Linden, W. J., & Pashley, P. J. (2000). Item selection and ability estimation in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Eds.), Computerized adaptive testing: Theory and practice (pp.1-25). Dordrecht, The Netherlands: Kluwer Academic Publishers.
van Rijn, P.W., Eggen,T. J. H. M., Hemker, B.T. & Sanders, P.F. (2002).Evaluation of selection procedures for computerized adaptive testing with polytomous items. Applied Psychological Measurement, 26, 393-411.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics 22, 203-226
Veldkamp, B. P., & van der Linden, W.J. (2002). Multidimensional adaptive testing with constraints on test content. Psychometrika, 67, 575-588.new window
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 Law School Admissions Test as an example. Applied Measurement in Education, 8, 157-186.new window
Wainer, H., & Kiely, G. L. (1987). Item clusters and computerized adaptive testing: A case for testlets. Journal of Educational Measurement, 24, 185-201.
Wainer, H., & Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27, 1-14.
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57, 741-758.
Wainer, H., & Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15, 22-29.new window

Wainer, H., Bradlow, E. T., & Du, Z. (2000). Testlet response theory: An analog for the 3-PL useful in adaptive testing. In W. J. van der Linden, & C. A. W. Glas (Ed.), Computerized adaptive testing: Theory and practice, (pp.245-269). Dordrecht, The Netherlands: Kluwer Academic Publishers.
Wainer, H., Dorans, N. J., Flaugher, R., Mislevy, R. J., Thissen, D., Eignor, D. Green, B. F., & Steinberg, L. (2000). Computerized adaptive testing: A primer (2nd ed.). Mahwah, NJ:. Lawrence Erlbaum.
Wainer, H., & Wang, X. (2000). Using a new statistical model for testlet to score TOEFL. Journal of Educational Measurement, 37, 203-220.
Walker, C. M. & Beretvas, S. N. (2003). Comparing multidimensional and unidimensional proficiency classifications: Multidimensional IRT as a diagnostic aid. Journal of Educational Measurement, 40, 255-275.
Wang, W. C. (2004). Direct estimation of correlation as a measure of association strength using multidimensional item response models. Educational and Psychological Measurement, 64, 937-955.
Wang, W.-C., & Chen, P.-H. (2004). Implementation and measurement efficiency of multidimensional computerized adaptive testing. Applied Psychological Measurement, 28, 295-316.
Wang, W.-C., Chen, P.-H., & Cheng, Y.-Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9, 116-136.new window
Wang, W.-C., & Wilson, M. R. (2005). The Rasch testlet model. Applied Psychological Measurement, 29, 126-149
Wang, X., Bradlow, E. T., & Wainer, H. (2002). A general Bayesian model for testlet: theory and applications. GRE Board Professional Report No. 98-01P. Princeton, NJ: Educational Testing Service.
Weiss, D. J. (1973). The stratified adaptive computerized ability test (Research Report 73-3). Minneapolis: University of Minnesota, Department of Psychology, Psychometric Methods Program, Computerized Adaptive Testing Laboratory.new window
Weiss, D. J. (1974). Strategies of adaptive ability measurement (Research Report 74-5). University of Minnesota, Department of Psychology, Psychometric Methods Program.
Weissman, A. (2006). A feedback control strategy for enhancing item selection efficiency in computerized adaptive testing. Applied Psychological Measurement, 30, 84-99.
Weissman, A. (2007). Mutual information item selection in adaptive classification testing. Educational and Psychological Measurement, 67, 41-58.
Whitely, S. E. (1980). Measuring aptitude processes with multicomponent latent trait models. (Technical Report NIE-80-5). Lawrence: University of Kansas.
Wilson, D., Wood, R., & Gibbons, R. (1984). TESTFACT. Test scoring, item statistics and item factor analysis [Computer software and manual]. Mooreville, IN: Scientific Software.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). ACER ConQuest: Generalised item response modeling software. Melbourne, Australia: Australian Council for Educational Research.
Yao, L. & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31, 83–105.
Yao, L. & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An Application to Mixed-Format Tests. Applied Psychological Measurement, 30, 469-492.new window
Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關書籍
 
無相關著作
 
QR Code
QRCODE