|
Abbot, M. L. (2007). A confirmatory approach to differential item functioning on an ESL reading assessment. Language Testing, 24, 7–36. Abedi, J. (2002). Standardized achievement tests and English language learners: Psychometric issues. Educational Assessment, 8, 231-257. Abedi, J., Bailey, A., Butler, F., Castellon-Wellington, M., Leon, S., & Mirocha, J. (2005). The Validity of Administering Large-Scale Content Assessments to English Language Learners: An Investigation from Three Perspectives. CSE Report 663. National Center for Research on Evaluation, Standards, and Student Testing (CRESST). Abedi, J., Lord, C., & Plummer, J. R. (1997). Final report of language background as a variable in NAEP mathematics performance. Center for the Study of Evaluation, National Center for Research on Evaluation, Standards, and Student Testing, Graduate School of Education & Information Studies, University of California, Los Angeles. Adams, R. J., & Wilson, M. (1996). Formulating the Rasch model as a mixed coefficients multinomial logit. In G. Englhard & M. Wilson (Eds.), Objective measurement: Theory into practice (Vol. 3, pp. 143-166). Norwood, NJ: Ablex. Adams, R. J., Wilson, M., & Wu, M. (1997). Multilevel item response models: An approach to errors in variables regression. Journal of Educational and Behavioral Statistics, 22, 47–76. Albano, A. D., & Rodriguez, M. C. (2013). Examining differential math performance by gender and opportunity to learn. Educational and Psychological Measurement, 73, 836–856. American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (1999). Standards for educational and psychological testing.Washington, DC: American Psychological Association. Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 3-23). Hillside, NJ: Lawrence Erlbaum. Banks, K. (2009). Using DDF in a post hoc analysis to understand sources of DIF. Educational Assessment, 14, 103–118 Bates, D. M. (2010). lme4: Mixed-effects modeling with R. URL http://lme4. r-forge. r- project. org/book. Bates, D., Maechler, M., Bolker, B., & Walker, S. (2015). lme4: Linear mixed-effects models using Eigen and S4. R package version 1.1–7. 2014.Institute for Statistics and Mathematics of WU website. http://CRAN. R-project. org/package= lme4. Accessed March, 18. Beretvas, S. N., Cawthon, S. W., Lockhart, L. L., & Kaye, A. D. (2012). Assessing Impact, DIF, and DFF in Accommodated Item Scores A Comparison of Multilevel Measurement Model Parameterizations. Educational and Psychological Measurement, 72(5), 754-773. Beretvas, S. N., & Walker, C. M. (2012). Distinguishing differential testlet functioning from differential bundle functioning using the multilevel measurement model. Educational and Psychological Measurement, 72(2), 200-2 Beretvas, S. N., & Williams, N. J. (2004). The use of HGLM as an item dimensionality assessment. Journal of Educational Measurement, 41, 379-395. Bolt, D. (2002). Studying the potential of nuisance dimensions using bundle DIF and multidimensional IRT analyses. In annual meeting of the National Council on Measurement in Education, New Orleans: LA. Tallahassee, FL. Cai, L. (2015). Examining sources of gender DIF using cross-classification multilevel IRT models. Unpublished Masters thesis. University of Nebraska-Lincoln. Camilli, G.L., & Shepard, L.A. (1994). Methods for identifying biased test items. Thousand Oakes, CA:Sage. Cheong, Y. F. (2001). Detecting ethnic differences in externalizing problem behavior items via multilevel and multidimensional Rasch models. In annual meeting of the American Educational Research Association, Seattle, WA. Cheong, Y. F., & Raudenbush, S. W. (2000). Measurement and structural models for children’s problem behaviors. Psychological Methods, 5, 477-495. Chu, K. L., & Kamata, A. (2004). Test equating in the presence of DIF items. Journal of applied measurement, 6(3), 342-354. Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-44. Cohen, A. S., & Bolt, D. M. (2005). A mixture model analysis of differential item functioning. Journal of Educational Measurement, 42(2), 133-148. De Ayala, R. J., Kim, S. H., Stapleton, L. M., & Dayton, C. M. (2002). Differential item functioning: A mixture distribution conceptualization. International Journal of Testing, 2(3-4), 243-276. De Boeck, P. (2008). Random item IRT models. Psychometrika, 73, 533–559 De Boeck, P., Cho, S.-J., & Wilson, M. (2011). Explanatory secondary dimension modeling of latent differential item functioning. Applied Psychological Measurement, 38, 583–603. Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel- Haenszel and standardization. In P. W. Holland & H. Wainer (Eds.), Differential item functioning (pp.35-66). Hillsdale, NJ: Lawrence Erlbaum Associates Dorans, N. J., & Kulick, E. (1986). Demonstrating the utility of the standardization approach to assessing unexpected differential item performance on the scholastic aptitude test. Journal of Educational Measurement, 23, 355–368. Douglas, J. A., Roussos, L. A., & Stout, W. (1996). Item‐Bundle DIF Hypothesis Testing: Identifying Suspect Bundles and Assessing Their Differential Functioning. Journal of Educational Measurement,33(4), 465-484. Engelhard, G. (1992). The measurement of writing ability with a many-faceted Rasch model. Applied Measurement in Education, 5, 171-191. Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29(6), 543-553. Ercikan, K. (2002). Disentangling sources of differential item functioning in multilanguage assessments. International Journal of Testing, 2(3-4), 199-215. Ercikan, K., Arim, R. G., Law, D. M., Lacroix, S., Gagnon, F., & Domene, J. F. (2010). Application of think-aloud protocols in examining sources of differential item functioning. Educational Measurement: Issues and Practice, 29(2), 24–35. Ercikan, K., Gierl, M. J., McCreith, T., Puhan, G., & Koh, K. (2004). Comparability of bilingualversions of assessments: Sources of incomparability of English and French versions of Canada’s national achievement tests, Applied Measurement in Education, 17, 301–321. Ercikan, K., & Lyons-Thomas, J. (2013). Adapting tests for use in other languages and cultures. In K. Geisinger (Ed.), APA handbook testing and assessment in psychology (Vol. 3; pp. 545–569). Washington, DC: American Psychological Association Fischer, G. H. (1973). The linear logistic test model as an instrument in educational research. Acta Psychologica, 37, 359-374. Fox, J. P. & Glas, C. A. W. (1998). Multi-level IRT with measurement error in the predictor variables. Research Report 98-16, University of Twente: The Netherlands. Fox, J.-P., & Glas, C.A.W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 269–286. Fox, J.-P., & Glas, C.A.W. (2003). Bayesian modeling of measurement error in predictor variables using item response theory. Psychometrika, 68, 169–191. Geranpayeh, A., & Kunnan, A. J. (2007). Differential Item Functioning in Terms of Age in the Certificate in Advanced English Examination∗. Language Assessment Quarterly, 4(2), 190-222. Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying Content and Cognitive Skills that Produce Gender Differences in Mathematics: A Demonstration of the Multidimensionality‐Based DIF Analysis Paradigm.Journal of Educational Measurement, 40(4), 281-306. Gierl, M. J., & Bolt, D. M. (2001). Illustrating the use of nonparametric regression to assess differential item and bundle functioning among multiple groups.International Journal of Testing, 1(3-4), 249-270. Gierl, M. J., & Khaliq, S. N. (2001). Identifying sources of differential item and bundle functioning on translated achievement tests: A confirmatory analysis. Journal of Educational Measurement,38(2), 164-187. Goldstein, H. (1987). Multilevel models in educational and social research. London: Griffin. Green, B. F., Crone, C. R., & Folk, V. G. (1989). A method for studying differential distractor functioning.Journal of Educational Measurement, 26(2), 147-160. Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mentel- Haenszel procedure. In H. Holland & H.I. Braun (Eds.), Test validity (pp. 129-145).Hillsdale, NJ:Erlbaum. Kamata, A. (1998). Some generalizations of the Rasch model: an application of the hierarchical generalized linear model. Unpublished doctoral dissertation. Michigan State University. Kamata, A. (2001). Item analysis by the hierarchical generalized linear model. Journal of Educational Measurement, 38, 79-93. Lepik, M. (1990). Algebraic word problems: Role of linguistic and structural variables. Educational Studies in Mathematics, 21(1), 83-90. Lord, F. M. (1980). Applications of item response theory to practical testing problems. Routledge. Luppescu, S. (2002). DIF detection in HLM item analysis. Paper presented at the Annual meeting of the American Eductional Research Association, New Orleans. Mazor, K. M., Clauser, B. E., & Hambleton, R. K. (1992). The effect of sample size on the functioning of the Mantel-Haenszel statistic. Educational and Psychological Measurement, 52(2), 443-451. Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289-304. Meulders, M., & Xie, Y. (2004). Person-by-item predictors. In Explanatory item response models (pp. 213-240). Springer New York. Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy- Stout''s test for DIF. Journal of Educational Measurement, 30(4), 293-311. Navas-Ara, M. J., & Gómez-Benito, J. (2002). Effects of ability scale purification on identification of DIF. European Jouranl of Psychological Assessment, 18, 9-15. Oliveri, M. E., & Ercikan, K. (2011). Do different approaches to examining construct comparability lead to similar conclusions? Applied Measurement in Education, 24, 1–18. Oliveri, M. E., Ercikan, K., & Zumbo, B. (2013). Analysis of sources of latent class differential item functioning in international assessments. International Journal of Testing, 13(3), 272-293. Pae, T. I. (2004). DIF for examinees with different academic backgrounds. Language testing, 21(1), 53-73. Plake, B.S. (1981). An ANOVA methodology to identify biased test items that takes instructional level into account. Educational and Psychological Measurement, 41, 365-368. R Core Team (2015). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/. Raju, N. S. (1988). The area between two item characteristic curves. Psychometrika, 53, 495–502. Raudenbush, S. W. (1993). A crossed random effects model for unbalanced data with applications in cross-sectional and longitudinal research. Journal of Educational and Behavioral Statistics, 18(4), 321-349. Ravand, H. (2015). Assessing Testlet Effect, Impact, Differential Testlet, and Item Functioning Using Cross-Classified Multilevel Measurement Modeling.SAGE Open, 5(2), 2158244015585607. Roth, W. M., Ercikan, K., Simon, M., & Fola, R. (2015). The assessment of mathematical literacy of linguistic minority students: Results of a multi-method investigation. The Journal of Mathematical Behavior, 40, 88-105. Roth, W.-M., Oliveri, M. E., Sandilands, D., Lyons-Thomas, J., & Ercikan, K. (2013). Investigating sources of differential item functioning using expert think-aloud protocols. International Journal of Science Education, 35, 546–576. Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20, 355-371. Scheuneman, J. (1979). A method of assessing bias in test items. Journal of Educational Measurement, 16(3), 143-152. Shealy, R., & Stout, W.F. (1993a). An item response theory for test bias. In P.W. Holland & H. Wainer (Eds.), Differential item functioning (pp. 197-239). Hillsdale, NJ:Erlbaum. Shealy, R., & Stout, W. F. (1993b). A model-based standardization approach that separates true bias/DIF from group differences and detects test bias/DTF as well as item bias/DIF.Psychometrika, 58, 159-194. Sireci, S. G., Fitzgerald, C., & Xing, D. (1998). Adapting credentialing examinations for international uses. Laboratory of Psychometric and Evaluative Research report No. 329. Amherst: University of Massachusetts, School of Education Skrondal, A., & Rabe-Hesketh, S. (2004).Generalized latent variable modeling: Multilevel, longitudinal, and structural equation models. Crc Press. Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational measurement,27(4), 361-370. Swanson, D. B., Clausesr, B. E., Case, S. M., Nungester, R. J., & Featherman, C. (2002). Analysis of differential item functioning using hierarchical logistic regression models. Journal of Educational and Behavioral Statistics, 27, 53–75. Thissen, D, Steinberg, L., & Wainer, H. (1988). Use of item response theory in the study of group differences in trace lines. In H. Wainer & H. I. Braun (Eds.), Test validity (pp. 147–169).Hillsdale, NJ: Lawrence Erlbaum. Van den Noortgate, W., & De Boeck, P. (2005). Assessing and explaining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30, 443–464. Van den Noortgate, W., De Boeck, P., & Meulders, M. (2003). Cross-classification multilevel logistic models in psychometrics. Journal of Educational and Behavioral Statistics, 28(4), 369-386. Wang, W. C., & Su, Y. -H. (2004). Effect of average signed area between two item characteristic curves and test purification procedures on the DIF detection via the Mantel-Haenszel method. Applied Measurement in Education, 17(2), 113-144. Walker, C. M. (2011). Why the DIF? Why differential item functioning analyses are an important part of instrument development and validation. Journal of Psychoeducational Assessment, 29, 364-376. Williams, N. J., & Beretvas, N. S. (2006). DIF identification using HGLM for polytomous items. Applied Psychological Measurement, 30, 22–42. Xie, Y., & Wilson, M. (2008). Investigating DIF and extensions using an LLTM approach and also an individual differences approach: an international testing context. Psychology Science, 50(3), 403. Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF). Ottawa, Ontario,Canada: Directorate of Human Resources Research and Evaluation, Department of National Defense. Zumbo, B. D. (2007). Three generation of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly: An International Journal. 4, 223-233. Zumbo, B. D., & Gelin, M. N. (2005). A Matter of Test Bias in Educational Policy Research: Bringing the Context into Picture by Investigating Sociological/Community Moderated (or Mediated) Test and Item Bias. Journal of Educational Research & Policy Studies, 5(1), 1-23. Zumbo, B. D., Liu, Y., Wu, A. D., Shear, B. R., Olvera Astivia, O. L., & Ark, T. K. (2015). A methodology for Zumbo’s third generation DIF analyses and the ecology of item responding. Language Assessment Quarterly, 12(1), 136-1.
|