:::

詳目顯示

回上一頁
題名:以交叉分類多層次試題反應模式探究差異試題功能成因
作者:孫國瑋
作者(外文):Guo-wei Sun
校院名稱:國立中山大學
系所名稱:教育研究所
指導教授:施慶麟
學位類別:博士
出版日期:2018
主題關鍵詞:交叉分類多層次試題反應模式多層次模式試題反應理論差異試題功能成因差異層面功能差異試題功能differential facet functioningdifferential item functioningmultilevel modelitem response theorythe sources of differential item functioningcross-classification multilevel item response model
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:0
當前在教育以及心理計量等領域上,差異試題功能(differential item functioning, DIF)的評估已成為不可或缺的一部份。研究者對於DIF研究的思維也隨著時代不斷演進,近年來,研究者逐漸開始重視對於DIF成因的探討(Zumbo, 2007)。多層次模式上的交叉分類多層次試題反應模式(cross-classification multilevel item response model, CCMIRT)除了能進行差異層面功能(differential facet functioning, DFF)的檢測之外,由於設定試題為隨機效果,有研究者提出也可將DIF效果設定為隨機效果(Van den Noortgate & De Boeck, 2005),可在分析上得到更多訊息,其中最大的助益則是可以藉由固定效果的添加而觀察隨機DIF效果的變異情形。
在使用試題特徵探討DIF成因時,可將隨機DIF效果與DFF檢測進行結合,鑒於在實務上經常遭遇同時有數個試題特徵皆與試題難度相關的情況,本研究認為在分析模式中應同時考量數個試題特徵的主要效果。因此本研究提出在CCMIRT模式結合隨機DIF效果下可分析數個試題特徵的模式,同時操弄與DIF檢測有關的情境,以期瞭解各種變項在此架構下對於解釋DIF成因的效能以及可能會產生的影響。
Differential item functioning (DIF) analyses are important in terms of test fairness
and test validaity. As Zumbo (2007) states, “Third Generation DIF” is best characterized by a subtle but extremely important change in how we think of DIF. The matter of wanting to know why DIF occurs is an early sign of the third generation of DIF.
To detect differential facet functioning (DFF) in such a multilevel setting, the cross-classification multilevel item response model (CCMIRT) can be adapted. When the group main effect and item-by-group interaction effects are included in the CCMIRT, the random effects of group over items represent the DIF residual. The CCMIRT can be further extended by adding item characteristics predictors to explain the DIF.
The purpose of this study is to investigate the sources of DIF by using the CCMIRT combining with DFF. In the simulation study, the variable about DIF effect were manipulated to better understand the performance of the CCMIRT. For fitting real test situation, the model including multiple item properties was suggested in this study.
Abbot, M. L. (2007). A confirmatory approach to differential item functioning on an
ESL reading assessment. Language Testing, 24, 7–36.
Abedi, J. (2002). Standardized achievement tests and English language learners:
Psychometric issues. Educational Assessment, 8, 231-257.
Abedi, J., Bailey, A., Butler, F., Castellon-Wellington, M., Leon, S., & Mirocha, J.
(2005). The Validity of Administering Large-Scale Content Assessments to English Language Learners: An Investigation from Three Perspectives. CSE Report 663. National Center for Research on Evaluation, Standards, and Student Testing (CRESST).
Abedi, J., Lord, C., & Plummer, J. R. (1997). Final report of language background as a
variable in NAEP mathematics performance. Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, Graduate School of Education & Information Studies, University of California, Los Angeles.
Adams, R. J., & Wilson, M. (1996). Formulating the Rasch model as a mixed
coefficients multinomial logit. In G. Englhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 143-166). Norwood, NJ: Ablex.
Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An
approach to errors in variables regression. Journal of Educational and Behavioral
Statistics, 22, 47–76.
Albano, A. D., & Rodriguez, M. C. (2013). Examining differential math performance
by gender and opportunity to learn. Educational and Psychological Measurement,
73, 836–856.
American Educational Research Association, American Psychological Association, &
National Council on Measurement in Education. (1999). Standards for educational
and psychological testing.Washington, DC: American Psychological Association.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P.
W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Hillside,
NJ: Lawrence Erlbaum.
Banks, K. (2009). Using DDF in a post hoc analysis to understand sources of DIF.
Educational Assessment, 14, 103–118
Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r-
project. org/book.
Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects
models using Eigen and S4. R package version 1.1–7. 2014.Institute for Statistics and Mathematics of WU website. http://CRAN. R-project. org/package= lme4. Accessed March, 18.
Beretvas, S. N., Cawthon, S. W., Lockhart, L. L., & Kaye, A. D. (2012). Assessing
Impact, DIF, and DFF in Accommodated Item Scores A Comparison of Multilevel
Measurement Model Parameterizations. Educational and Psychological Measurement, 72(5), 754-773.
Beretvas, S. N., & Walker, C. M. (2012). Distinguishing differential testlet functioning
from differential bundle functioning using the multilevel measurement model. Educational and Psychological Measurement, 72(2), 200-2
Beretvas, S. N., & Williams, N. J. (2004). The use of HGLM as an item dimensionality
assessment. Journal of Educational Measurement, 41, 379-395.
Bolt, D. (2002). Studying the potential of nuisance dimensions using bundle DIF and
multidimensional IRT analyses. In annual meeting of the National Council on Measurement in Education, New Orleans: LA. Tallahassee, FL.
Cai, L. (2015). Examining sources of gender DIF using cross-classification multilevel
IRT models. Unpublished Masters thesis. University of Nebraska-Lincoln.
Camilli, G.L., & Shepard, L.A. (1994). Methods for identifying biased test items.
Thousand Oakes, CA:Sage.
Cheong, Y. F. (2001). Detecting ethnic differences in externalizing problem behavior
items via multilevel and multidimensional Rasch models. In annual meeting of the American Educational Research Association, Seattle, WA.
Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for
children’s problem behaviors. Psychological Methods, 5, 477-495.
Chu, K. L., & Kamata, A. (2004). Test equating in the presence of DIF items. Journal of
applied measurement, 6(3), 342-354.
Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify
differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44.
Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item
functioning. Journal of Educational Measurement, 42(2), 133-148.
De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item
functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276.
De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559
De Boeck, P., Cho, S.-J., & Wilson, M. (2011). Explanatory secondary dimension
modeling of latent differential item functioning. Applied Psychological
Measurement, 38, 583–603.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-
Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential
item functioning (pp.35-66). Hillsdale, NJ: Lawrence Erlbaum Associates
Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization
approach to assessing unexpected differential item performance on the scholastic
aptitude test. Journal of Educational Measurement, 23, 355–368.
Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item‐Bundle DIF Hypothesis
Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning. Journal of Educational Measurement,33(4), 465-484.
Engelhard, G. (1992). The measurement of writing ability with a many-faceted Rasch
model. Applied Measurement in Education, 5, 171-191.
Ercikan, K. (1998). Translation effects in international assessments. International
Journal of Educational Research, 29(6), 543-553.
Ercikan, K. (2002). Disentangling sources of differential item functioning in
multilanguage assessments. International Journal of Testing, 2(3-4), 199-215.
Ercikan, K., Arim, R. G., Law, D. M., Lacroix, S., Gagnon, F., & Domene, J. F. (2010).
Application of think-aloud protocols in examining sources of differential item functioning. Educational Measurement: Issues and Practice, 29(2), 24–35.
Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of
bilingualversions of assessments: Sources of incomparability of English and French versions of Canada’s national achievement tests, Applied Measurement in Education, 17, 301–321.
Ercikan, K., & Lyons-Thomas, J. (2013). Adapting tests for use in other languages and
cultures. In K. Geisinger (Ed.), APA handbook testing and assessment in psychology (Vol. 3; pp. 545–569). Washington, DC: American Psychological Association
Fischer, G. H. (1973). The linear logistic test model as an instrument in educational
research. Acta Psychologica, 37, 359-374.
Fox, J. P. & Glas, C. A. W. (1998). Multi-level IRT with measurement error in the
predictor variables. Research Report 98-16, University of Twente: The Netherlands.
Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using
Gibbs sampling. Psychometrika, 66, 269–286.
Fox, J.-P., & Glas, C.A.W. (2003). Bayesian modeling of measurement error in
predictor variables using item response theory. Psychometrika, 68, 169–191.
Geranpayeh, A., & Kunnan, A. J. (2007). Differential Item Functioning in Terms of Age
in the Certificate in Advanced English Examination∗. Language Assessment Quarterly, 4(2), 190-222.
Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying Content
and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the Multidimensionality‐Based DIF Analysis Paradigm.Journal of Educational Measurement, 40(4), 281-306.
Gierl, M. J., & Bolt, D. M. (2001). Illustrating the use of nonparametric regression to
assess differential item and bundle functioning among multiple groups.International Journal of Testing, 1(3-4), 249-270.
Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle
functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement,38(2), 164-187.
Goldstein, H. (1987). Multilevel models in educational and social research. London:
Griffin.
Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential
distractor functioning.Journal of Educational Measurement, 26(2), 147-160.
Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mentel-
Haenszel procedure. In H. Holland & H.I. Braun (Eds.), Test validity (pp. 129-145).Hillsdale, NJ:Erlbaum.
Kamata, A. (1998). Some generalizations of the Rasch model: an application of the
hierarchical generalized linear model. Unpublished doctoral dissertation. Michigan State University.
Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal
of Educational Measurement, 38, 79-93.
Lepik, M. (1990). Algebraic word problems: Role of linguistic and structural variables.
Educational Studies in Mathematics, 21(1), 83-90.
Lord, F. M. (1980). Applications of item response theory to practical testing problems.
Routledge.
Luppescu, S. (2002). DIF detection in HLM item analysis. Paper presented at the
Annual meeting of the American Eductional Research Association, New Orleans.
Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on
the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443-451.
Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in
mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289-304.
Meulders, M., & Xie, Y. (2004). Person-by-item predictors. In Explanatory item
response models (pp. 213-240). Springer New York.
Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy-
Stout''s test for DIF. Journal of Educational Measurement, 30(4), 293-311.
Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Jouranl of Psychological Assessment, 18, 9-15.
Oliveri, M. E., & Ercikan, K. (2011). Do different approaches to examining construct
comparability lead to similar conclusions? Applied Measurement in Education, 24, 1–18.
Oliveri, M. E., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class
differential item functioning in international assessments. International Journal of
Testing, 13(3), 272-293.
Pae, T. I. (2004). DIF for examinees with different academic backgrounds. Language
testing, 21(1), 53-73.
Plake, B.S. (1981). An ANOVA methodology to identify biased test items that takes
instructional level into account. Educational and Psychological Measurement, 41, 365-368.
R Core Team (2015). R: A language and environment for statistical computing. R
Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/.
Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53,
495–502.
Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data with
applications in cross-sectional and longitudinal research. Journal of Educational and Behavioral Statistics, 18(4), 321-349.
Ravand, H. (2015). Assessing Testlet Effect, Impact, Differential Testlet, and Item
Functioning Using Cross-Classified Multilevel Measurement Modeling.SAGE Open, 5(2), 2158244015585607.
Roth, W. M., Ercikan, K., Simon, M., & Fola, R. (2015). The assessment of
mathematical literacy of linguistic minority students: Results of a multi-method investigation. The Journal of Mathematical Behavior, 40, 88-105.
Roth, W.-M., Oliveri, M. E., Sandilands, D., Lyons-Thomas, J., & Ercikan, K. (2013).
Investigating sources of differential item functioning using expert think-aloud protocols. International Journal of Science Education, 35, 546–576.
Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm.
Applied Psychological Measurement, 20, 355-371.
Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational
Measurement, 16(3), 143-152.
Shealy, R., & Stout, W.F. (1993a). An item response theory for test bias. In P.W.
Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale,
NJ:Erlbaum.
Shealy, R., & Stout, W. F. (1993b). A model-based standardization approach that
separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF.Psychometrika, 58, 159-194.
Sireci, S. G., Fitzgerald, C., & Xing, D. (1998). Adapting credentialing examinations
for international uses. Laboratory of Psychometric and Evaluative Research report No. 329. Amherst: University of Massachusetts, School of Education
Skrondal, A., & Rabe-Hesketh, S. (2004).Generalized latent variable modeling:
Multilevel, longitudinal, and structural equation models. Crc Press.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using
logistic regression procedures. Journal of Educational measurement,27(4), 361-370.
Swanson, D. B., Clausesr, B. E., Case, S. M., Nungester, R. J., & Featherman, C.
(2002). Analysis of differential item functioning using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27, 53–75.
Thissen, D, Steinberg, L., & Wainer, H. (1988). Use of item response theory in the
study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169).Hillsdale, NJ: Lawrence Erlbaum.
Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential
item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443–464.
Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification
multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369-386.
Wang, W. C., & Su, Y. -H. (2004). Effect of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113-144.
Walker, C. M. (2011). Why the DIF? Why differential item functioning analyses are an
important part of instrument development and validation. Journal of Psychoeducational Assessment, 29, 364-376.
Williams, N. J., & Beretvas, N. S. (2006). DIF identification using HGLM for
polytomous items. Applied Psychological Measurement, 30, 22–42.
Xie, Y., & Wilson, M. (2008). Investigating DIF and extensions using an LLTM
approach and also an individual differences approach: an international testing context. Psychology Science, 50(3), 403.
Zumbo, B. D. (1999). A handbook on the theory and methods of differential item
functioning (DIF). Ottawa, Ontario,Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense.
Zumbo, B. D. (2007). Three generation of DIF analyses: Considering where it has been,
where it is now, and where it is going. Language Assessment Quarterly: An International Journal. 4, 223-233.
Zumbo, B. D., & Gelin, M. N. (2005). A Matter of Test Bias in Educational Policy
Research: Bringing the Context into Picture by Investigating Sociological/Community Moderated (or Mediated) Test and Item Bias. Journal of Educational Research & Policy Studies, 5(1), 1-23.
Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K.
(2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136-1.
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
QR Code
QRCODE