:::

詳目顯示

回上一頁
題名:認知診斷模式之變動長度電腦適性測驗
作者:許嘉凌
作者(外文):Chia-Ling Hsu
校院名稱:國立中正大學
系所名稱:心理學研究所
指導教授:陳淑英
王文中
學位類別:博士
出版日期:2014
主題關鍵詞:電腦適性測驗認知診斷變動長度測驗安全性內容平衡屬性平衡computerized adaptive testingcognitive diagnosisvariable-lengthtest securitycontent balancingattribute balancing
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:8
隨著電腦科技的發展,電腦適性測驗(computerized adaptive testing;簡稱CAT) 已被廣泛使用在測驗評量中。CAT的理論基礎建構於能在一個連續量尺上為受測者提供概括性的能力估計值的試題反應理論(item response theory;簡稱IRT)。高施測效率是CAT的主要優點之一,相較於傳統紙筆測驗,CAT可用較少的試題數便能準確評估受測者的能力水準。而自從美國於2001年通過『No Child Left Behind Act』後,認知診斷評量(cognitive diagnostic assessment)近年來在心理計量或教育領域越來越受到重視。認知診斷評量的主要目的是診斷受測者在各項技能(skill)或潛在屬性(latent attribute)是否精熟(mastered),教育專業人員可根據評量的結果,提供適當的補救教學,進而提高學習成效。因此,能提供認知診斷結果的「認知診斷模式」(cognitive diagnosis models;簡稱CDMs)也備受重視,CDMs所提供的認知診斷結果,能讓教育專業人員瞭解受測者在各項技能或潛在屬性上是否已達到精熟。
為了使認知診斷評量能更有效地應用在教育評量,結合CAT與CDMs所發展的認知診斷電腦適性測驗(cognitive diagnosis computerized adaptive testing;簡稱CD-CAT)成為一種新興的測驗模式。CD-CAT 雖兼具CAT與CDM的優點,其在實務上的應用仍存在許多問題待解決,例如,大多數CD-CAT採「固定長度」(fixed-length)終止準則,針對不同受測者,「固定長度」CD-CAT並無法提供相同的精確度。同時,大部分CD-CAT均未考量內容平衡(content balancing)或測驗安全(test security)等測驗實務上所需考量的重要議題。除此之外,目前CD-CAT大多採用一般常見的CDMs,當測驗情境較複雜時,若能採用較複雜CDMs做為理論模式將會更適當。
為了有效將CD-CAT應用於測驗實務,本論文共提出三個研究目的,第一、發展「變動長度」(variable-length)CD-CAT,也就是在CD-CAT中採用「固定精準度」(fixed-precision)終止準則;第二、發展以實務限制為考量的「變動長度」CD-CAT;以及第三,發展非一般形式CDMs之「變動長度」CD-CAT。以上三個研究目的是透過五個模擬研究來進行。
模擬研究一是為達成第一項研究目的所設定,在CD-CAT中採用兩種「固定精準度」終止準則並評估其效果,結果顯示,兩種「固定精準度」終止準則在CD-CAT中獲得的認知診斷結果,其精確性均能達到預設的目標值。隨著精確度目標值越嚴格,測驗結果的精確性也隨之提升,但是測驗效率則會下降,意即受測者需要施測更多的試題數方能達到預設的目標值。接續的模擬研究二及三是為了達到研究目的二而設計的,在模擬研究二中,主要考慮的實務限制為內容平衡及測驗安全性兩項,而模擬研究三中則考慮潛在屬性平衡(attribute balancing)。模擬研究二結果顯示,唯有透過曝光控管方法,方可同時控管試題曝光率與測驗重疊率,隨著試題曝光率及測驗重疊率控管愈嚴格,測驗安全也隨之提升,但測驗效率則會下降,表示受測者需要施測更多試題數方可達到預設的測驗安全水準。而模擬研究三結果顯示,透過考量潛在屬性平衡,測驗效率會有所影響,受測者施測較少的試題數即可達到預設的精確度。針對研究目的三而設計的模擬研究四及五則是在CD-CAT中採用了較複雜CDMs,兩個模擬研究所採用的CDMs均提供受測者兩種潛在特質,一為潛在能力(連續量尺),另一為潛在屬性(間斷量尺),透過固定受測者潛在能力及潛在屬性精準度,受測者的潛在特質皆能達到本研究所設定的目標值。
Computerized adaptive testing (CAT) has been commonly used in education and assessment testing during the past decades. CAT obtains latent trait estimate for each examinee efficiently and accurately, compared with non-adaptive testing. CAT algorithms are usually built on item response theory (IRT) models, which yield a summary report of a latent continuous trait in each examinee. However, a summative score cannot offer sufficient feedback to each examinee. The “No Child Left Behind Act,” which was passed in 2001, mandated that the diagnostic feedback must be provided to teachers, students, and parents. Hence, cognitive diagnosis models (CDMs) have attracted the attention of much recent research. The aim of CDMs is to provide a profile (also denoted as latent class) of diagnostic feedback; the profile consists of information on a set of latent binary attributes that represent the status of an examinee’s mastery. Interest in developing CAT under CDMs has increased recently because of the efficiency of CAT and the diagnostic feedback provided by CDMs.
However, utilizing CD-CAT in real-world tests is still subject to the limitations in the current CD-CAT literature. For example, the development of most CD-CAT algorithms has focused on item selection procedures, particularly those with a fixed-length termination rule, which frequently have led to different degrees of measurement precision or accuracy of classification. In addition, several practical issues, such as content balancing and test security, are important in the implementation of CAT programs. Furthermore, an unusual CDM is needed to account for the complexity of real-world data.
This dissertation has three objectives: 1) implement a CD-CAT program with a fixed-precision termination rule (also denoted as variable-length CD-CAT); 2) develop a variable-length CD-CAT program with considerations of practical issues; 3) build a variable-length CD-CAT program under a composite CDM, which provides both information on latent class and latent trait. Five studies were conducted to achieve the three objectives.
In Study 1, two fixed-precision, latent class termination criteria were adopted to achieve the first objective. The results showed that the two termination criteria successfully achieved the pre-specified accuracy of classification. The higher the level of precision, the longer the test length was required for examinees to complete the CD-CAT. Studies 2 and 3 were conducted to evaluate the performance of the CD-CAT program in solving practical issues. Content balancing and test security were investigated in Study 2, and attribute balancing was considered in Study 3. The results showed that only the exposure control method could make item exposure rate and test overlap less than the pre-specified values. With strict levels of pre-specified maximum item exposure rate and test overlap, the mean test length and the proportion of examinees administering the maximum test length increased. With respect to attribute balancing, test efficiency could be improved by controlling attribute balancing. That is, the number of items required for reaching a pre-specified accuracy of classification decreased when attribute balancing was taken into account in item selection. Composite models that provide both latent class and latent trait information were used in Studies 4 and 5 to evaluate the effect of variable-length CD-CAT programs under an unusual CDM. The results showed that the adoption of both precisions—latent class and latent trait termination rule—ensured the precisions of the two latent abilities.
Adams, R. J., Wilson, M., & Wang, W.-C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21, 1-23.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick (Eds.), Statistical theories of mental test scores (pp. 397-479). Reading, MA: Addison-Wesley.new window
Chen, S.-Y., & Ankenmann, R. D. (2004). Effects of practical constraints on item selection rules at the early stages of computerized adaptive testing. Journal of Educational Measurement, 41, 149-174.
Chen, S.-Y. & Lei, P.-W. (2005). Controlling item exposure and test overlap in computerized adaptive testing. Applied Psychological Measurement, 29, 204–217.new window
Chen, S.-Y., Lei, P.-W., & Liao, W.-H. (2008). Controlling item exposure and test overlap on the fly in computerized adaptive testing. British Journal of Mathematical and Statistical Psychology, 61, 471-492.new window
Cheng, Y. (2009). When cognitive diagnosis meets computerized adaptive testing: CD-CAT. Psychometrika, 74, 619-632.
Cheng, Y. (2010). Improving cognitive diagnostic computerized adaptive testing by balancing attribute coverage: The modified maximum global discrimination index method. Educational and Psychological Measurement, 70, 902-913.
Chipman, & R.L. Brennan (Eds.), Cognitively diagnostic assessment (pp. 327-359). Hillsdale, NJ: Erlbaum.
Choi, S.-W., Grady, M. W., & Dodd, B. G. (2011). A new stopping rule for computerized adaptive testing. Educational and Psychological Measurement, 71, 37-53.
de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika. 69, 333-353.
de la, & Torre, J. (2011). The generalized DINA model framework. Psychometrika, 76, 179-199.
Frey, A., Cheng, Y., & Seitz, N.-N. (2011, April). Content Balancing with the Maximum Priority Index Method in Multidimensional Adaptive Testing. Paper presented at the Annual Meeting of the National Council on Measurement in Education, New Orleans, LA.
Hartz, S.M.C. (2002). A Bayesian framework for the unified model for assessing cognitive abilities: Blending theory with practicality. Unpublished doctoral dissertation, University of Illinois at Urbana-Champaign, Champaign, IL.
Henson, R., & Douglas, J. (2005). Test construction for cognitive diagnosis. Applied Psychological Measurement, 29, 262-277.
Hsu, C.-L., & Chen, S.-Y. (2007). Controlling item exposure and test overlap in variable length computerized adaptive testing (in Chinese). Psychological Testing, 54, 403–428.new window
Hsu, C.-L., Wang, W.-C., & Chen, S.-Y. (2013). Variable-length computerized adaptive testing based on cognitive diagnosis models. Applied Psychological Measurement, 37, 563-582.
Huebner, A. (2010). An overview of recent developments in cognitive diagnostic computer adaptive assessments. Practical Assessment Research & Evaluation, 15(3). Available online: http://pareonline.net/getvn.asp?v=15&n=3.
Junker, B. W., & Sijtsma, K. (2001). Cognitive assessment models with few assumptions, and connections with nonparametric item response theory. Applied Psychological Measurement, 25, 258-272.
Kingsbury, G. G., & Zara, A. R. (1989). Procedures for selecting items for computerized adaptive tests. Applied Measurement in Education, 2, 359-375.
Maris, E. (1999). Estimating multiple classification latent class models. Psychometrika, 64, 187-212.
McGlohen, M., & Chang, H.-H. (2008). Combining computer adaptive testing technology with cognitively diagnostic assessment. Behavior Research Methods, 40, 808-821.
McKinley, R. L., & Reckase, M. D. (1982). The use of the general Rasch model with multidimensional item response data (Research Report ONR 82-1). Iowa City IA: American College Testing.
Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago, IL: The University of Chicago Press. (Original work published 1960)
Rupp, A. A., Templin, J., & Henson, R. A. (2010). Diagnostic measurement: Theory, methods, and applications (The statistical structure of core DCMs). New York, NY: Guilford.
Segall, D. O. (1996). Multidimensional adaptive testing. Psychometrika, 61, 331-354.
Stocking, M. L., & Lewis, C. (1995). A new method of controlling item exposure in computerized adaptive testing (Research Report 95–25). Princeton: Educational Testing Service.
Sympson, J. B., & Hetter, R. D. (1985). Controlling item-exposure rates in computerized adaptive testing. In Proceedings of the 27th annual meeting of the Military Testing Association (pp. 973–977). San Diego: Navy Personnel Research and Development Centre (Research Report 95–25). Princeton: Educational Testing service.
Tatsuoka, K. (1985). A probabilistic model for diagnosing misconceptions in the pattern classification approach. Journal of Educational Statistics, 12, 55-73.
Tatsuoka, K. (1995). Architecture of knowledge structures and cognitive diagnosis: A statistical pattern recognition and classification approach. In P.D. Nichols, S.F.
Tatsuoka, K. K. (1983). Rule space: An approach for dealing with misconceptions based on item response theory. Journal of Educational Measurement, 20, 345-354.
Templin, J., & Henson, R. (2006). Measurement of psychological disorders using cognitive diagnosis models. Psychological Methods, 11, 287-305.
van der Linden, W. J. (1995, June). Bayesian item selection in adaptive testing. Paper presented at the Annual Meeting of the Psychometric Society. Minneapolis MN.
van der Linden, W. J., & Glas, C.A.W. (Eds.). (2000). Computerized adaptive testing: Theory and practice. Boston, MA: Kluwer.
van der Linden, W. J., & Veldkamp, B. P. (2004). Constraining item exposure in computerized adaptive testing with shadow tests. Journal of Educational and Behavioral Statistics, 29, 273–291.
Veerkamp, W. J. J., & Berger, M. P. F. (1997). Some new item selection criteria for adaptive testing. Journal of Educational and Behavioral Statistics 22, 203-226
von Davier, M. (2005). A general diagnostic model applied to language testing data (ETS Research Report RR-05–16). Princeton, NJ: Educational Testing Service.
Wainer, H. (Ed.). (2000). Computerized adaptive testing: A primer (2nd Ed.). Mahwah, NJ: Erlbaum.
Wang, C., Chang, H.-H., & Boughton, K. (2012). Deriving stopping rules for multidimensional computerized adaptive testing. Applied Psychological Measurement, 37, 99-122.
Wang, C., Chang, H.-H., & Huebner, A. (2011). Restrictive stochastic item selection methods in cognitive diagnostic computerized adaptive testing. Journal of Educational Measurement, 48, 255-273.
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21, 361-375.
Xu, X., Chang, H., & Douglas, J. (2003, April). A simulation study to compare CAT strategies for cognitive diagnosis. Paper presented at the Annual Meeting of the American Education Research Association, Chicago, IL.
Yao, L. (2013). Comparing the performance of five multidimensional CAT selection procedures with different stopping rules. Applied Psychological Measurement, 37, 3-23.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE