中文部分
王嘉寧(2006)。影響試題差異功能的試題特徵探討以90-95年國中基本學力測驗地理科試題為例。未出版之碩士論文,國立台灣師範大學教育心理與輔導學系研究所,台北。
王寶墉(1995)。現代測驗理論。臺北:心理出版社。
台灣師大心測中心(2008)。「2008年國中基測研發成果」媒體交流茶會:「95至97年國中基測DIF分析研究」。臺北:國立臺灣師範大學心理與教育測驗研究發展中心。線上檢索日期:2009年6月20日。網址: http://www.bctest.ntnu.edu.tw/flying/flying51-60/NO55-002-008.pdf
余民寧(2009)。試題反應理論(IRT)及其應用。臺北:心理出版社。
余民寧、謝進昌(2005)。以最大測驗訊息量決定通過分數之研究。測驗學刊,52(2),149-176。余民寧、謝進昌(2006)。國中基本學力測驗之DIF的實徵分析:以91年度兩次測驗為例。國立高雄教育大學教育學刊,26,241-276。李信宏(1999)。傳統檢定試題偏誤(DIF)方法的改良與分析。(國科會專題研究計畫成果報告編號:NSC89-2118-M-018 -003)。台北:中華民國行政院國家科學委員會。
李茂能(2001)。四分相關矩陣計算軟體(A Simple Program for Computing Tetrachoric Correlations):TETRAEXE。線上檢索日期:2009年6月20日。網址:http://web.ncyu.edu.tw/~fredli/。
李茂能(2006)。結構方程模式軟體Amos之簡介及其在測驗編製上之應用:Graphics & Basic。臺北:心理。
林坤昌(1998)。DIF檢定方法之探討與比較。未出版之碩士論文,國立台中師範學院國民教育研究所,台中。
林碧珍、蔡文煥(2005)。TIMSS 2003 台灣國小四年級學生數學成就及其相關因素之探討。科學教育月刊,285,2-38。
吳冠瑩(2003)。三種多元化計分之試題差異功能診斷法的比較。未出版之碩士論文,國立台灣師範大學數學研究所,台北。
邱皓政(2005)。量化研究法一:研究設計與資料處理。台北市:雙葉。
邱皓政(2006)。結構方程模式:LISREL的理論、技術與應用。臺北:雙葉。
凃柏原(2007)。國中生基本學力測驗量尺分數轉換。教育研究學報,41(1),61-77。洪碧霞(1991)。大學入學考試題目分析時IRT模式選擇之初探。國立台南師院。
黃財尉(1998)。從多點計分方式探討國中數學成就測驗之DIF。未出版之碩士論文,國立彰化師範大學教育研究所,彰化。
黃財尉、李信宏(1999)。國中數學成就測驗性別DIF之探討。測驗年刊,46,45-60。
黃瓅瑩(2008)。HGLM分析DIF之比較與應用。未出版之碩士論文,國立台南大學測驗統計研究所,台南。
許雪立(1998)。項目功能差異的分析與應用。敬賀張厚架教授從教50周年學生論文選承集。北京:北京師範大學出版社。
國民中學學生基本學力測驗推動工作委員會(2008)。九十七年國民中學學生基本學力測驗問與答。線上檢索日期:2008年12月31日。網址:http://www.bctest.ntnu.edu.tw/97bctest_q&a.htm
張厚粲、曹亦薇(1999)。漢語辭彙測驗中的項目功能差異初探。心理學報,31(4),460-467。
葉雅俐(2000)。線上題庫與適性測驗整合系統之發展研究。未出版之碩士論文,國立政治大學教育研究所,台北。
曾秀芹、孟慶茂(1999)。項目功能差異及其檢測法。心理學動態,7(2),47-57。
曾建銘(2004)。Gender Differences in Performance and Differential Item Functioning on the Basic Mathematics Competence Test for Junior High School Students in Taiwan。中學教育學報,11,331-354。
曾建銘(2005)。93年第一次國中基本學力測驗數學科區域試題差別功能的探討與研究。(教育部台灣省中等學校教師研習會九十四年度研究計畫編號:94105)。台中:教育部臺灣省中等學校教師研習會。
楊佩馨(2008)。DFIT與Poly-SIBTEST在DIF與DBF分析之比較研究。未出版之碩士論文,國立台南大學測驗統計研究所,台南。
蔡良庭、楊志堅、王文中、施慶麟(2008)。應用MIMIC模式評估方法以檢定試題差異性之研究。測驗學刊,55(2),287-312。盧雪梅(2000)。Mantel-Haenszel DIF程序之第一類錯誤率和DIF嚴重度分類結果研究。中國測驗學會測驗年刊,47(1),57-71。盧雪梅(2007)。國民中學學生基本學力測驗國文科和英語科成就性別差異和性別差別試題功能(DIF)分析。教育研究與發展期刊,3(4),79-111。盧雪梅、毛國楠(2008)。國中基本學力測驗數學科之性別差異與差別試題功能(DIF)分析。教育研究與實踐 21 ( 2 ),95 -126。蘇旭琳(2006)。DIF分析在小樣本情境中的效果。未出版之碩士論文,國立台灣師範大學數學研究所,台北。
蘇雅蕙(2001)。多分題差異試題功能之檢定。未出版之碩士論文,國立中正大學心理學研究所,嘉義縣。
蘇雅蕙、王文中(2001)。三種Mantel-Haenszel DIF 檢驗程序的效果。發表於第五屆華人社會心理與教育測驗學術研討會。
西文部分
Abbott, M. (2007). Gender equity in Alberta’s Social Studies 30 diploma examinations. Alberta Journal of Educational Research. Revised manuscript resubmitted for review.
Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.
Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37-51.
Meade, A. W., Lautenschlager, G. J., & Johnson, E. C. (2006, April). Alternate Cut off Values and DFIT Tests of Measurement Invariance. Paper presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Anderson, W. & Krathwohl, D. R. (Eds.)(2001). A taxonomy for learning,teaching, and assessing: A revision of Bloom’s educational objectives.NY: Longamn.
Andrich, D. (1978). A rating formulation for ordered response categories, Psychometrika, 43, 561-573.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P.W. Holland & H. Wainer (Eds.), Differential item functioning, 3-23. Hillsdale, NJ: Lawrence Erlbaum.
Bandalos, D. L., Finney, S. J. & Geske, J. A. (2003). A model of statistics performance based on achievement goal theory. Journal of Educational Psychology, 95, 604-616.
Barton, K. & Finch, H. (2004). Using DIF analyses to examine assumptions of unidimensionality across groups of students with disabilities and with accommodations. In Detecting Item Bias for Students with Disabilities and English Language Learners. symposium conducted at meeting of the National Council on Measurement in Education, San Diego, CA.
Berberoglu, G. (1995). Differential item functioning (DIF) analysis of computation, word problem and geometry questions across gender and SES group. Studied in Educational Evaluation, 21, 439-356.
Bielinski, J. & Davison, M. L. (2001). A Sex Difference by Item Difficulty Interaction in Multiple-Choice Mathematics Items Administered to National Probability Samples. Journal of Educational Measurement, 38(1), 51-77.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (chapters 17-20). Reading, MA: Addison-Wesley.Bloom, B. S. (1956). Taxonomy of educational objectives: Handbook of cognitive domain. New York: McKay.
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full information item factor analysis. Applied Psychological Measurement, 12, 261-280.
Bollen, K. A. (1989). Structural equation modeling with latent variables. New York: John Wiley.
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 13-141.
Bolt, D., & Stout, W. (1996). Differential item functioning: Its multidimensional model and resulting SIBTEST detection procedure. Behaviormetrika, 23, 67-95.
Braddy, P. W., Meade, A. W., & Johnson, E. C. (2006). Practical Implications of Using Different Tests of Measurement Invariance for Polytomous Measures. Paper presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Burton, N. (1995. April). How have the changes in the SAT affected women’s mathematics performance? Paper presented at the Annual Meeting of the American Research Association. San Francisco.
Byrne, B. M., Shavelson, R. J. & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychological Bulletin, 105(4), 456-466.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.
Caradall, C., & Coffman, W. E. (1964). A method for comparing the performance of different groups on the item in a test. (College Entrance Examination Board Research and Development Report 64-5, No.9 ; ETS Research Bulletin 64-61). Princeton, NJ: Educational Testing Service.Chan, D. (2000). Detection of differential item functioning on the Kirton Adaptation-Innovation Inventory using multiple-group mean and covariance structure analyses, Multivariate Behavioral Research, 35, 169-199.
Chang, H. H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance. Structural Equation Modeling, 9(2), 233-255.
Clauser, B. E., Mazor, K., & Hambleton, R. K. (1991). The influence of the criterion variable on the identification of differentially functioning items using the Mantel-Haenszel statistic. Applied Psychological Measurement, 15, 353-359.
Cohen, A. S., & Kim, S. (1993). A comparison of Lord's x2 and Raju's area measures on detection of DIF. Applied Psychological Measureme, 17, 39-52.
Cohen, A. S. & Kim, S. (1998). An Investigation of Linking Methods Under the Graded Response Model. Applied Psychological Measurement, 22(2), 116-130.
Cohen, A. S., Kim, S. H., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20,15-26.Collins, W. C., Raju, N. S., & Ethvards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451-461.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Dodd, B.G., Koch, W.R. & De Ayala, R.J.(1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129-143.
Doolittle, A. F. (1984). Interpretation of Differential Item Peformance Accompanied by Gender differences in Academic Background. Paper presented at the Annual Meeting of the American Educational Research Association. (68th. New Orleans. LA. April 23-2. 1984).
Doolittle, A. E., & Cleary, T. A. (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24, 157-166.
Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland, & H. Wainer (Eds.), Differential item functioning ,35-66. Hillsdale, NJ: Lawrence Earlbaum.
Dorans, N. J., & Potenza, M. T. (1994). Equity assessment for polytomously scored items: A taxonomy of procedures for assessing differential item functioning (RR No. 94-49).Princeton, NJ: Educational Testing Service.
Douglas, J., Kim, H. R., Roussos, L., Stout, W., & Zhang, J. (1999). LSAT dimensionality analysis for the December 1991, June 1992, and October, 1992, administrations. (Statistical Report No. 95-05). Newton, PA: Law School Admission Council.
Douglas, J., Roussos, L., & Stout, W., (1996). Item bundle DIF hypothesis testing: Identifying suspect bundles and assessing their DIF. Journal of Educational Measurement, 33, 465-484.
Ellis, B. B. (1989). Differential item functioning: Implications for test translation. Journal of Applied Psychology, 74, 912-921.Engelhard, G., Anderson, D., & Gabrielson, S. (1990). An empirical comparison of Mantel - Haenszel procedure and Rasch procedure for studying DIF on teacher certification tests. Journal of Research and Development in Education, 23 (4), 173-179.
Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings obtained from different rating sources comparable? Journal of Applied Psychology, 86(2), 215-227.
Ferrando, P.J. (1996). Calibration of invariant item parameters in a continuous item response model using the extended Lisrel measurement submodel. Multivariate Behavioral Research, 31, 419-439.
Fidalgo, A. M., Mellenbergh, G. J., & Muniz, J. (2000). Effects of amount of DIF test length and purification type on robustness and power of Mantel-Haenszel procedures. Method of Psychological Research, 5(3), 43-53.Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of Differential Item Functioning on Age and Gender Differences in Functional Disability. Journal of Gerentology, Social Sciences, 57 (5), 275-284.
Flowers, C. P., Raju, N. S., & Oshima, T. C. (1999).A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309-326.
Flowers, C. P., Raju, N. S., & Oshima, T. C. (2002). Measurement Equivalence Methods: A Comparison of Measurement Equivalence Methods Based on Confirmatory Factor Analysis and Item Response Theory. Paper presented at NCME Annual Meeting, New Orleans, Louisiana, USA.
French, A. W., & Miller, T. R.(1996). Logistic regression and its use in detecting dierential item functioning in polytomous items. Journal of Educational Measurement, 33, 315-332.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.
Gabriel, L.R., Stephen, S., & Oleksandr, S.C. (2008) The Effects of Referent Item Parameters on Differential Item Functioning Detection Using the Free Baseline Likelihood Ratio Test. Applied Psychological Measurement June, 33, 251-265.
Gallagher. A. M.. & Lisi. R. (1994). Gender differences in Scholastic Aptitude Test: Mathematics problem solving among high-ability students. Journal of Educational Psychology, 86(2), 204-211.
Gallagher, A. M., De Lisi, R., Holst, P. C., McGillicuddy-De Lisi, A. V., Morely, M., & Cahalan, C.(2000). Gender differences in advanced mathematical problem solving. Journal of Experimental Child Psychology, 75, 165-190.
Gierl, M. J. (2005). Using dimensionality-based DIF analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues & Practice. Vol 24(1), 3-14.
Gierl, M.J., Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement:Issues and Practice, 20, 26-36.
Gierl, M. J., & Bolt, D. (2003, April). Implications of the multidimensionality-based DIF analysis framework for selecting a matching and studied subtest. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Gierl, M. J., Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIF analysis paradigm. Journal of Educational Measurement, 40, 281-306.
Gierl, M. J., Jodoin, M. G., & Ackerman, T. A. (2000). Performance of Mantel-Haenszel, Simultaneous Item Bias Test, and Logistic Regression When the Proportion of DIF Items is Large. Paper Presented at the Annual Meeting of the American Educational Research Association (AERA), New Orleans, Louisiana, USA.
Gierl, M. J., Rogers, W. T., & Klinger, D. (1999). Using statistical and judgmental reviews to identify and interpret translation DIF. Paper presented at the meeting of the National Council on Measurement in Education, Montreal, Canada.
Glöckner -Rist, A. & Hoijtink , H. (2003). The best of Both Worlds: Factore Analysis of Dichotomous data Using Item Response Theory and Structural Equation Modeling. Structural Equation Modeling, 10(4), 544-565.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1992). Multivariate Data Analysis with Readings (3rd ed.). New York: Macmillan.
Haley, D. C. (1952). Estimation of the dosage mortality relationship when the dose is subject to error. Technical Report No. 15, Stanford, Calif: Stanford University, Applied Mathematics and Statistics Laboratory.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational Measurement, 147-200. New York: Macmillan.
Hambleton, R. K. & Rogers, H. (1988). Detecting Biased Test Items: Comparison of the IRT Area and Mantel - Haenszel Methods. (ERIC Document Reproduction Service N0. ED 300398).
Hambleton, R. K., & Swaminathan. H. (1985). Item response theory: Principles and applications (Ed.). Boston, MA: Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hamilton, L., Nussbaum, M., Kupermintz, H., Kerkhoven, J., & Snow, R. (1995). Enhancing the validity and usefulness of large-scale educational assessments: NELS:88 science achievement. American Educational Research Journal, 32, 555-581.
Han, K. T. (2007). IRTEQ: Windows application that implements IRT scaling and equating [computer program]. Applied Psychological Measurement, 33(6), 491-493.
Harris , A. M.,& Carlton, S. T. (1993). Patterns of gender differences on mathematics items on the SAT. Applied Measurement in Education, 6, 137-151.
Hattie, J. A. (1985). Methodological review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.
Hayduk, L. A. (1987). Structural Equation Modeling with LISREL: Essentials and Advances. Baltimore: Johns Hopkins University Press.
Hidalgo-Montesinos, M. D., Gómez-Benito, J. (2003). Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression. European Journal of Psychological Assessment, 19 (1), 1-11.
Hoelter, J. W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research, 11, 325-344.
Holland, W. P. (1985). On the study of differential item performance without IRT. Proceedings of the Military Testing Association. I , 282-287.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity. 129-145. Hillsdale, NJ: Lawrence Erlbaum.
Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural equation modeling.- Concepts, issues, and applications, 158-176. Thousand Oaks, CA: Sage.
Huang , C. D., Church,A. T., & Katigbak,M. S. (1997) . Identifying cultural differences in items and traits;Differential item functioning in the NEO personality inventory. Journal of Cross Cultural Psychology, 28, 192-218.
Huang, P.-R., Sun, G.-W., & Shih, C.-L. (2009). Assessing differential item functioning using Mantel-Haenszel method with DIF-free-then-DIF strategy. Pacific Rim Objective Measurement Symposium Hong Kong (PROMS HK 2009). July 28-30, 2009, Hong Kong.
Jackson. C.. & Braswell. J. (1992. April). An analysis of factors causing dfferential item functioning on SAT-Mathematics items. Paper presented at the Akimial Meeting of the American Research Association. San Francisco.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurment in Educational, 14, 329-349.
Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.
Jöreskog. K. G., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago: Scientific Software International.
Judd, C. M., & McClelland, G. H. (1989). Data analysis: A model comparison approach. San Diego, CA: Harcourt, Brace, Jovanovich.
Kamata, A. & Bauer, D. J. (2008). A note on the relationship between factor analytic and item response theory models. Structural Equation Modeling. 15, 136-153.
Kim, W. (2003). Development of a dfferential item functioning (DIF) procedure using the hierarchical generalized linear model: A comparison study with logistic regression procedure. Unpublished doctoral dissertation, University of Pennsylvania.
Kim, S. H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29, 51-66.
Kim, S. H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355.Knol, D. L., & Berger, M. P. F. (1991). Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 26(3), 457-477.
Kupermintz, H., M. M. Ennis, L. S. Hamilton, J. E. Talbert, & R. E. Snow. (1995). Enhancing the Validity and Usefulness of Large-scale Educational Assessments: I. NELS:88 Mathematics Achievement. American Educational Research Journal , 32(3), 525-554.
Lane, S., Wang, N. & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational Measurement: Issues and Practice, 15(4), 121-127.
Li, H. & Stout, W. F. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677.
Linacre, J. M. & Wright, B. D. (1997). A user’s guide to BIGSTEPS: Rasch model computer program. [Computer program]. Version 2.8. Chicago: MESA Press.
Linn, R. L., Levine, M. V., Hastings, C. N., & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5, 159-173.Lisa, S. (2002). An examination of measurement equivalence in survey administration methods. Unpublished doctoral dissertation, the Graduate College of the Illinois Institute of Technology, U.S.A.
Lord, F. M. (1952). A theory oftest scores. Psychometric Monograph, 7. New York:Psychometric Society.
Lord, F. M. (1980). Applications of item response theory to proctical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley.MacIntosh, R. & Hashim, S.(2003). Converting MIMIC Model Parameters to IRT Parameters: A Comparison Of Variance Estimation Methods. Applied Psychological Measurement, 27, 372-379.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psvchomethka. 47, 149-174.
Maurer, T. J., Raju, N. S., & Collins, W. C. (1998). Peer and subordinate performance measurement equivalence. Journal of Applied Psychology, 83, 693-702.
McClelland, G. H., & Judd, C. M. (1989). Data analysis: A model comparison approach. San Diego, CA: Harcourt Brace Jovanovich.
McDonald, R. P. (1967). Nonlinear factor analysis. Psychometric Monographs, No. 15.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
McDonald, R. P., & Ho, M. H. R. (2002). Principles and practices in reporting structural equation analysis. Psychological Methods, 7, 64-82.
Meade, A. W., & Lautenschlager, G. J. (2004). A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance. Organizational Research Methods, 7(4), 361-388.
Me11enbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118.
Mellenbergh, G.J. (1994). A unidimensional latent trait model for continuous item responses. Multivariate Behavioral Research, 29, 223-236.
Mehta, P. D., & Taylor, W. P. (2006, June). On the relationship between item response theory and factor analysis of ordinal variables: Multiple group case. Paper presented at the 71st annual meeting of the Psychometric Society, HEC Montreal, Canada.
Miller, M. D., & Oshima, T. C. (1992), Effect of sample size, number of biased items, and magnitude of bias on a two-stage item bias estimation method. , Applied Psychological Measurement, 16, 381-388.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
Moustaki, I., Jöreskog, K. G., & Mavridis, D. (2004). Factor models for ordinal variables with covariate effects on the manifest and latent variables: A comparison of LISREL and IRT approaches. Structural Equation Modeling, 11, 487-513.
Mueller, R. O. (1996). Basic principles of structural equation modeling: An introduction to LISREL and EQS. New York: Springer.
Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models,205–234. Newbury Park, CA: Sage.
Muthén, B., Kao, C. F., & Burstein, L. (1991). Instructional sensitivity in mathematics achievement test items: Applications of a new IRT-based detection technique. Journal of Educational Measurement, 28, 1-22.Muthén, L. K., & Muthén, B. O. (2004). Mplus user’s guide, version 3. Los Angeles: Author.
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Psychometrika. in press.
Nandakumar, R. (1993). Assessing essential unidimensionality of real data. Applied Psyhdaiml Measuremm, 17(1), 29- 38.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18, 315-328.
Narayanan, P., & Swaminathan, H. (1996). Indentification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274.
Navas-Ara, M. J., & Gomez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Jouranl of Psychological Assessment, 18, 9-15.
Nohoon, K., Davenport, E. C., & Davison, M. L. (1998, April). A comparative study of observed score approaches and purification procedures for detecting differential item functioning. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Norberto, V. (2003). An empirical comparison of measurement equivalence methods based on confirmatory factor analysis (with mean and covariance structures analysis) and item response theory. Unpublished doctoral dissertation, the Graduate College of the Illinois Institute of Technology, U.S.A.
OECD (2004). Learning for tomorrow’s world-first results from PISA 2003. Pairs: Author.
O’Neill, K. A., Wild, C. L., & McPeek, W. M. (1989). Gender-related differential item performance on graduate admissions tests. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
O'Neill, K.A., & McPeek, W.M. (1993). Item and test characteristics that are associated with differential item functioning. In EW. Holland & H. Wainer (Eds.), Differential item functioning , 255-276. Hillsdale, NJ: Lawrence Erlbaum Associates.Oort, F. J. (1998). Simulation study of item bias detection with, restricted factor analysis. Structural Equation modeling. 5, 107-124.
Oshima, T. C., & Miller, M. D. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16, 237-248.
Oshima, T. C., Kushubar, S., Scott, J. C., & Raju, N. S. (2009). DFIT for Window User's Manual: Differential functioning of items and tests. St. Paul MN: Assessment Systems Corporation.
Oshima, T. C., Raju, N. S., Flowers, C. P., & Slinde, J. A. (1998). Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning. Applied Measurement in Education, 11, 353-369.
Oshima, T. C., Raju, N. S., & Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43(1), 1-17.
Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.
Pedhazur, E. J., & Pedhazur, S. L. (1991). Measurement, Design, and Analysis: an Integrated Approach. Hillsdale, NJ: Lawrence Erlbaum Associates.
Raju, N. S. (1988). The area between two item characteristics curves. Psychometrika, 54, 495-502.
Raju, N.S. (1999). DFIT5P: A Fortran program for calculating DIF/DTF [Computer Program]. Chicago, Illinois Institute of Technology.
Raju, N. S. (1990). Determining the significance of estimated sign and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87(3), 517-529.
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353-368.
Ramsey, EA. (1993). Sensitivity review: the ETS experience as a case study. In EW. Holland & H. WaJner (Eds.). Differential item functioning, 367-388. Hillsdale, NJ: Lawrence Erlbaum Associates.
Rasch, G. (1960). Probabilistic models for some intelligence αnd attαinment tests. Chicago: University of Chicago Press.
Raykov, T., & Marcoulides, G. A. (2000). A Method for Comparing Completely Standardized Solutions in Multiple Groups. Structural Equation Modeling, 7(2), 292-308.
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implication. Journal of Educational Statistics, 4, 207-230.
Reckase, M. D. (1996). Test construction in the 1990’s: Recent approaches every psychologist should know. Psychological Assessment, 8, 354-359.
Reise, S.P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychol Bull, 114(3), 552-566.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
Ryan. K. E.. & Fan. M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational measurement: Issues and practice, 15(4), 21-27.Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Schermelleh-Engel, K., Moosbrugger, H., & Muller, H. (2003). Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit Measures. Methods of Psychological Research Online, 8(3), 23-74.
Schumacker, R. & Lomax, R. (1996). A Beginner’s Guide to Structural Equation Modeling. Mahwah,NJ:Lawrence Erlbaum Associates.
Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. W. Holland and H. Wainer (Eds.). Differential Item Functioning ,197-239. Hillsdale, NJ: Erlbaum.Shepard, L.A., Camilli, G., & Williams, D. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77-105.
Shih, C.-L. & Wang W.-C. (2005). Locating DIF-Free Items as Anchors Using the Iterative Constant-Item Method . The 14th International Meeting and the 70th Annual Meeting of the Psychometric Society, July 5-9, 2005, Tilburg, The Netherlands.
Sircci, S. G., & Berberoglu, U. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13, 229-248.
Sireci, S.G., Fitzgerald, C.,& Xing, D. (1998). Adapting credentialing examinations for international uses. Laboratory of Psychometric and Evaluative Research Report 329. Amherst: University of Massachusetts, School of Education.
Sireci, S. G., Foster, D., Olsen, J. B., & Robin, F. (1997, March). Comparing dual-language versions of international computerized certification exams. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Sörbom, , D. (1982). Structural equation models with structured means. In K. G . Joreskog & H. Wold (Eds.), Systems under indirect observation , 183-195. Amsterdam: North Holland.
Stanley, F. & Chris, W. (2007, August). The Implications and Detection of Differential Item Functioning in Survey Analysis. Paper presented at the annual conference of the American Political Science Association.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292-1306.
Steenkamp, J. B., & Baumgartner, H. (1998). Assessing Measurement Invariance in Cross-National Consumer Research, Journal of Consumer Research, 25(1), 78-90.
Su, Y. H., & Wang, W. C. (2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods for detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.
Su, Y. H., Shih, C. L., & Wang, W. C.(2006). Locating DIF-Free Items to Serve as Anchors for Detection of Differential Item Functioning. Chinese Association of Psychological Testing Annual meeting in 2006. Taipei, Taiwan.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Tabachnick, B., & Fidell, L. (1996). Use of item response theory and the testlet concept in the measurement of psychopathology.Psychological Methods, 1(1), 81-97.
Takane, Y. & deLeeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.
Thissen, D., Chen, W. H., & Bock, D. (2003). MULTILOG for Windows (Version 7.0) [computer program]. Lincolnwood, IL: Scientific Software International.
Thissen, D. & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567-577.
Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247-260.
Thissen, D. & Wainer, H. ( Eds ) (2001) . Test Scoring. Hilisdale. NJ Lawrence Eribaum Associate.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–69.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8, 157-186.Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57(5), 741-759
Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197-219.
Wang, W. C., & Su, Y. H. (2004). Factors Influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.
Wang,W. C.,& Yeh, Y. L. (2003). Effects ofanchor item methods on differential item functioning detection with the likelihood ratio test. AppliedPsychological Measurement, 27, 479-498.Waternaux, C. M. (1976). Asymptotic distribution of the sample roots for a nonnormal population. Biometrika, 63, 639-645.
Weiss, D. J. & McBride, J. R. (1984). Bias and information of Bayesian adaptive testing. Applied Psychological Measurement, 8, 273-285.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications, 56−75. Thousand Oaks, CA: Sage.
William, S. (1999). DIFPACK—Dimensionality-based DIF/DBF package:SIBTEST, POLY-SIBTEST, CROSSING SIBTEST,DIFSIM, DIFCOMP [Computer software].Urbana-Champaign, II.: Author.
Wilson, D.T., Wood, R.& Gibbons, R. (1984). TESTFACT. Chicago: Scientific Software, Inc.
Wirth, R. J. & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58-79.
Wright, B. D. & Linacre, J. M. (1997). User's Guide to BIGSTEPS: Rasch-Model Computer Program. Chicago: MESA Press.
Zhang, J. & Stout, W. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 321–249.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.). Differential item functioning ,337-347. Hillsdale, NJ: Erlbaum.Zumbo, B. D. & Koh, K. H. (2005). Manifestation of Differences in Item-Level Characteristics in Scale-Level Measurement Invariance Tests of Multi-Group Confirmatory Factor Analyses. Journal of Modern Applied Statistical Methods, 4, 275-282.
Zwick, R.,Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of Educauonal Measurement. 30, 233-251.