:::

詳目顯示

回上一頁
題名:模擬與實徵試題差異功能之指標效能分析:IRT法及CFA法之比較
作者:何宗岳
作者(外文):Zong-Yue Her
校院名稱:國立嘉義大學
系所名稱:教育學系研究所
指導教授:李茂能
學位類別:博士
出版日期:2010
主題關鍵詞:試題反應理論(IRT)驗證性因素分析(CFA)試題差異功能(DIF)Item Response Theory(IRT)Confirmatory Factor Analysis(CFA)Differential Items Functioning(DIF)
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:97
本研究旨在比較試題反應理論(IRT)與驗證性因素分析(CFA)二種不同取向試題差異功能(DIF)檢測之效能及其應用於實證研究上之結果。研究設計分兩部分,研究一為模擬實驗,比較六種檢測法在各種條件下之DIF檢測效能以及三種檢測模式之應用效果。研究二為實徵分析,針對97年第一次基測數學科做性別DIF分析。研究主要結論摘要如下:
一、IRT取向檢測法較適用於二元資料,CFA取向檢測法較適用於多元資料。
二、IRT取向與CFA取向各檢測法之統計考驗力與第Ⅰ類型錯誤比率皆是隨樣 本數增加而提高。
三、二元資料型態下,CFA取向之CFA-λ對一致性、雙重性DIF較不靈敏,CFA取向之MIMIC與IRT取向之SIBTEST則對非一致性、雙重性DIF較不靈敏。多元資料型態下,CFA取向之CFA-λ對一致性、雙重性DIF較不靈敏,CFA取向之MACS、 MIMIC與IRT取向之SIBTEST、DFIT則對非一致性DIF較不靈敏。
四、 IRT取向與CFA取向各檢測法之統計考驗力與第Ⅰ類型錯誤比率皆是隨DIF比率增加而提高。
五、二元資料型態下,小樣本及低DIF比率時,「未修正臨界值一般檢測法」檢測效能較佳;大樣本及高DIF比率時,「修正臨界值迭代檢測法」檢測效能較佳。多元資料型態下,在各種不同樣本及不同DIF比率情況下,「修正臨界值迭代檢測法」檢測效能皆較佳。
六、97年第一次國中基測數學科試題之性別差異,雖在統計檢測上顯示部份試題具有DIF,但在質性分析中則未見試題內容具有偏誤之情形。
The main goal of the study was to examine the CFA-based and IRT-based indices power and robustness in detecting DIF via both simulated and empirical data. First, six indices(CFA-λ、MACS、MIMIC & LRT、SIBTEST、DFIT) were compared in a Monte Carlo study with regard to their ability to detect DIF. The moderating effects of number of response categories、sample size、number of DIF items、DIF seriousness、type of DIF were also investigated under 3 testing models. Second, the empirical data from the 2007 Mathematics Basic Competency Tests for Junior High School Students(BCTEST)were investigated for gender DIF using the well-performed DIF indices.
The main conclusions of this study were summarized as follows.
1、IRT-based indices were more powerful in detecting dichotomously scored DIF items, but as the number of possible scores for an item increased, the CFA-based indices performed better in detecting DIF items.
2、Both IRT-based and CFA-based Procedures for DIF, the index power of DIF detectability and typeⅠerror increased as the sample size increased .
3、With dichotomous items, the IRT-based CFA-λ indices were not sensitive to identifying the uniform or mixed DIF items, and neither were the CFA-based MIMIC、IRT-based SIBTEST procedure with the nonuniform DIF items. With polytomous items, the CFA-λ procedure were not sensitive to to identifying the uniform or mixed DIF items, neither were the CFA-based MACS、MIMIC and IRT-based SIBTEST、DFIT procedure with the nonuniform DIF items.
4、Both IRT-based and CFA-based Procedures for DIF, the power of DIF detectability and typeⅠerror increased as the number of DIF items in a test increased .
5、With dichotomous items, indices with the general procedure using the uncorrected critical α value performed better in detecting DIF items than the general procedure or the iterative procedure using the corrected critical α value under small samples and lower percentage of DIF items, while the iterative procedure using the corrected critical α value has better performace than the other two procedures under large samples and higher percentage of DIF items. With polytomous items, indices with the iterative procedure using the corrected critical α value performed better in detecting DIF items than the general procedure using the uncorrected or the corrected critical α value under all conditions.
6.、Regarding to gender DIF analysis on the 2007 BCTEST math test, several items were flagged as a DIF item in the statistical analyses, but none of them revealed bias in the substantive analyses .
中文部分
王嘉寧(2006)。影響試題差異功能的試題特徵探討以90-95年國中基本學力測驗地理科試題為例。未出版之碩士論文,國立台灣師範大學教育心理與輔導學系研究所,台北。
王寶墉(1995)。現代測驗理論。臺北:心理出版社。
台灣師大心測中心(2008)。「2008年國中基測研發成果」媒體交流茶會:「95至97年國中基測DIF分析研究」。臺北:國立臺灣師範大學心理與教育測驗研究發展中心。線上檢索日期:2009年6月20日。網址: http://www.bctest.ntnu.edu.tw/flying/flying51-60/NO55-002-008.pdf
余民寧(2009)。試題反應理論(IRT)及其應用。臺北:心理出版社。
余民寧、謝進昌(2005)。以最大測驗訊息量決定通過分數之研究。測驗學刊,52(2),149-176。new window
余民寧、謝進昌(2006)。國中基本學力測驗之DIF的實徵分析:以91年度兩次測驗為例。國立高雄教育大學教育學刊,26,241-276。new window
李信宏(1999)。傳統檢定試題偏誤(DIF)方法的改良與分析。(國科會專題研究計畫成果報告編號:NSC89-2118-M-018 -003)。台北:中華民國行政院國家科學委員會。
李茂能(2001)。四分相關矩陣計算軟體(A Simple Program for Computing Tetrachoric Correlations):TETRAEXE。線上檢索日期:2009年6月20日。網址:http://web.ncyu.edu.tw/~fredli/。
李茂能(2006)。結構方程模式軟體Amos之簡介及其在測驗編製上之應用:Graphics & Basic。臺北:心理。
林坤昌(1998)。DIF檢定方法之探討與比較。未出版之碩士論文,國立台中師範學院國民教育研究所,台中。
林碧珍、蔡文煥(2005)。TIMSS 2003 台灣國小四年級學生數學成就及其相關因素之探討。科學教育月刊,285,2-38。
吳冠瑩(2003)。三種多元化計分之試題差異功能診斷法的比較。未出版之碩士論文,國立台灣師範大學數學研究所,台北。
邱皓政(2005)。量化研究法一:研究設計與資料處理。台北市:雙葉。
邱皓政(2006)。結構方程模式:LISREL的理論、技術與應用。臺北:雙葉。
凃柏原(2007)。國中生基本學力測驗量尺分數轉換。教育研究學報,41(1),61-77。new window
洪碧霞(1991)。大學入學考試題目分析時IRT模式選擇之初探。國立台南師院。
黃財尉(1998)。從多點計分方式探討國中數學成就測驗之DIF。未出版之碩士論文,國立彰化師範大學教育研究所,彰化。
黃財尉、李信宏(1999)。國中數學成就測驗性別DIF之探討。測驗年刊,46,45-60。
黃瓅瑩(2008)。HGLM分析DIF之比較與應用。未出版之碩士論文,國立台南大學測驗統計研究所,台南。
許雪立(1998)。項目功能差異的分析與應用。敬賀張厚架教授從教50周年學生論文選承集。北京:北京師範大學出版社。
國民中學學生基本學力測驗推動工作委員會(2008)。九十七年國民中學學生基本學力測驗問與答。線上檢索日期:2008年12月31日。網址:http://www.bctest.ntnu.edu.tw/97bctest_q&a.htm
張厚粲、曹亦薇(1999)。漢語辭彙測驗中的項目功能差異初探。心理學報,31(4),460-467。
葉雅俐(2000)。線上題庫與適性測驗整合系統之發展研究。未出版之碩士論文,國立政治大學教育研究所,台北。
曾秀芹、孟慶茂(1999)。項目功能差異及其檢測法。心理學動態,7(2),47-57。
曾建銘(2004)。Gender Differences in Performance and Differential Item Functioning on the Basic Mathematics Competence Test for Junior High School Students in Taiwan。中學教育學報,11,331-354。
曾建銘(2005)。93年第一次國中基本學力測驗數學科區域試題差別功能的探討與研究。(教育部台灣省中等學校教師研習會九十四年度研究計畫編號:94105)。台中:教育部臺灣省中等學校教師研習會。
楊佩馨(2008)。DFIT與Poly-SIBTEST在DIF與DBF分析之比較研究。未出版之碩士論文,國立台南大學測驗統計研究所,台南。
蔡良庭、楊志堅、王文中、施慶麟(2008)。應用MIMIC模式評估方法以檢定試題差異性之研究。測驗學刊,55(2),287-312。new window
盧雪梅(2000)。Mantel-Haenszel DIF程序之第一類錯誤率和DIF嚴重度分類結果研究。中國測驗學會測驗年刊,47(1),57-71。new window
盧雪梅(2007)。國民中學學生基本學力測驗國文科和英語科成就性別差異和性別差別試題功能(DIF)分析。教育研究與發展期刊,3(4),79-111。new window
盧雪梅、毛國楠(2008)。國中基本學力測驗數學科之性別差異與差別試題功能(DIF)分析。教育研究與實踐 21 ( 2 ),95 -126。new window
蘇旭琳(2006)。DIF分析在小樣本情境中的效果。未出版之碩士論文,國立台灣師範大學數學研究所,台北。
蘇雅蕙(2001)。多分題差異試題功能之檢定。未出版之碩士論文,國立中正大學心理學研究所,嘉義縣。
蘇雅蕙、王文中(2001)。三種Mantel-Haenszel DIF 檢驗程序的效果。發表於第五屆華人社會心理與教育測驗學術研討會。
西文部分
Abbott, M. (2007). Gender equity in Alberta’s Social Studies 30 diploma examinations. Alberta Journal of Educational Research. Revised manuscript resubmitted for review.
Ackerman, T.A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29, 67-91.
Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22(3), 37-51.
Meade, A. W., Lautenschlager, G. J., & Johnson, E. C. (2006, April). Alternate Cut off Values and DFIT Tests of Measurement Invariance. Paper presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Anderson, W. & Krathwohl, D. R. (Eds.)(2001). A taxonomy for learning,teaching, and assessing: A revision of Bloom’s educational objectives.NY: Longamn.
Andrich, D. (1978). A rating formulation for ordered response categories, Psychometrika, 43, 561-573.
Angoff, W. H. (1993). Perspectives on differential item functioning methodology. In P.W. Holland & H. Wainer (Eds.), Differential item functioning, 3-23. Hillsdale, NJ: Lawrence Erlbaum.
Bandalos, D. L., Finney, S. J. & Geske, J. A. (2003). A model of statistics performance based on achievement goal theory. Journal of Educational Psychology, 95, 604-616.
Barton, K. & Finch, H. (2004). Using DIF analyses to examine assumptions of unidimensionality across groups of students with disabilities and with accommodations. In Detecting Item Bias for Students with Disabilities and English Language Learners. symposium conducted at meeting of the National Council on Measurement in Education, San Diego, CA.
Berberoglu, G. (1995). Differential item functioning (DIF) analysis of computation, word problem and geometry questions across gender and SES group. Studied in Educational Evaluation, 21, 439-356.
Bielinski, J. & Davison, M. L. (2001). A Sex Difference by Item Difficulty Interaction in Multiple-Choice Mathematics Items Administered to National Probability Samples. Journal of Educational Measurement, 38(1), 51-77.
Birnbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord & M. R. Novick, Statistical theories of mental test scores (chapters 17-20). Reading, MA: Addison-Wesley.new window
Bloom, B. S. (1956). Taxonomy of educational objectives: Handbook of cognitive domain. New York: McKay.
Bock, R. D., Gibbons, R., & Muraki, E. (1988). Full information item factor analysis. Applied Psychological Measurement, 12, 261-280.
Bollen, K. A. (1989). Structural equation modeling with latent variables. New York: John Wiley.
Bolt, D. M. (2002). A Monte Carlo comparison of parametric and nonparametric polytomous DIF detection methods. Applied Measurement in Education, 15(2), 13-141.
Bolt, D., & Stout, W. (1996). Differential item functioning: Its multidimensional model and resulting SIBTEST detection procedure. Behaviormetrika, 23, 67-95.
Braddy, P. W., Meade, A. W., & Johnson, E. C. (2006). Practical Implications of Using Different Tests of Measurement Invariance for Polytomous Measures. Paper presented at the 21st Annual Conference of the Society for Industrial and Organizational Psychology, Dallas, TX.
Burton, N. (1995. April). How have the changes in the SAT affected women’s mathematics performance? Paper presented at the Annual Meeting of the American Research Association. San Francisco.
Byrne, B. M., Shavelson, R. J. & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychological Bulletin, 105(4), 456-466.
Camilli, G., & Shepard, L. A. (1994). Methods for identifying biased test items. Thousand Oaks, CA: Sage.new window
Candell, G. L., & Drasgow, F. (1988). An iterative procedure for linking metrics and assessing item bias in item response theory. Applied Psychological Measurement, 12, 253-260.
Caradall, C., & Coffman, W. E. (1964). A method for comparing the performance of different groups on the item in a test. (College Entrance Examination Board Research and Development Report 64-5, No.9 ; ETS Research Bulletin 64-61). Princeton, NJ: Educational Testing Service.new window
Chan, D. (2000). Detection of differential item functioning on the Kirton Adaptation-Innovation Inventory using multiple-group mean and covariance structure analyses, Multivariate Behavioral Research, 35, 169-199.
Chang, H. H., Mazzeo, J., & Roussos, L. (1996). Detecting DIF for polytomously scored items: An adaptation of the SIBTEST procedure. Journal of Educational Measurement, 33, 333-353.
Cheung, G. W., & Rensvold, R. B. (2002). Evaluating Goodness-of-Fit Indexes for Testing Measurement Invariance. Structural Equation Modeling, 9(2), 233-255.
Clauser, B. E., Mazor, K., & Hambleton, R. K. (1991). The influence of the criterion variable on the identification of differentially functioning items using the Mantel-Haenszel statistic. Applied Psychological Measurement, 15, 353-359.
Cohen, A. S., & Kim, S. (1993). A comparison of Lord's x2 and Raju's area measures on detection of DIF. Applied Psychological Measureme, 17, 39-52.
Cohen, A. S. & Kim, S. (1998). An Investigation of Linking Methods Under the Graded Response Model. Applied Psychological Measurement, 22(2), 116-130.
Cohen, A. S., Kim, S. H., & Wollack, J. A. (1996). An investigation of the likelihood ratio test for detection of differential item functioning. Applied Psychological Measurement, 20,15-26.new window
Collins, W. C., Raju, N. S., & Ethvards, J. E. (2000). Assessing differential functioning in a satisfaction scale. Journal of Applied Psychology, 85, 451-461.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52, 281–302.
Dodd, B.G., Koch, W.R. & De Ayala, R.J.(1989). Operational characteristics of adaptive testing procedures using the graded response model. Applied Psychological Measurement, 13, 129-143.
Doolittle, A. F. (1984). Interpretation of Differential Item Peformance Accompanied by Gender differences in Academic Background. Paper presented at the Annual Meeting of the American Educational Research Association. (68th. New Orleans. LA. April 23-2. 1984).
Doolittle, A. E., & Cleary, T. A. (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24, 157-166.
Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P.W. Holland, & H. Wainer (Eds.), Differential item functioning ,35-66. Hillsdale, NJ: Lawrence Earlbaum.
Dorans, N. J., & Potenza, M. T. (1994). Equity assessment for polytomously scored items: A taxonomy of procedures for assessing differential item functioning (RR No. 94-49).Princeton, NJ: Educational Testing Service.
Douglas, J., Kim, H. R., Roussos, L., Stout, W., & Zhang, J. (1999). LSAT dimensionality analysis for the December 1991, June 1992, and October, 1992, administrations. (Statistical Report No. 95-05). Newton, PA: Law School Admission Council.
Douglas, J., Roussos, L., & Stout, W., (1996). Item bundle DIF hypothesis testing: Identifying suspect bundles and assessing their DIF. Journal of Educational Measurement, 33, 465-484.
Ellis, B. B. (1989). Differential item functioning: Implications for test translation. Journal of Applied Psychology, 74, 912-921.new window
Engelhard, G., Anderson, D., & Gabrielson, S. (1990). An empirical comparison of Mantel - Haenszel procedure and Rasch procedure for studying DIF on teacher certification tests. Journal of Research and Development in Education, 23 (4), 173-179.
Facteau, J. D., & Craig, S. B. (2001). Are performance appraisal ratings obtained from different rating sources comparable? Journal of Applied Psychology, 86(2), 215-227.
Ferrando, P.J. (1996). Calibration of invariant item parameters in a continuous item response model using the extended Lisrel measurement submodel. Multivariate Behavioral Research, 31, 419-439.
Fidalgo, A. M., Mellenbergh, G. J., & Muniz, J. (2000). Effects of amount of DIF test length and purification type on robustness and power of Mantel-Haenszel procedures. Method of Psychological Research, 5(3), 43-53.new window
Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel-Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295.
Fleishman, J. A., Spector, W. D., & Altman, B. M. (2002). Impact of Differential Item Functioning on Age and Gender Differences in Functional Disability. Journal of Gerentology, Social Sciences, 57 (5), 275-284.
Flowers, C. P., Raju, N. S., & Oshima, T. C. (1999).A description and demonstration of the polytomous-DFIT framework. Applied Psychological Measurement, 23, 309-326.
Flowers, C. P., Raju, N. S., & Oshima, T. C. (2002). Measurement Equivalence Methods: A Comparison of Measurement Equivalence Methods Based on Confirmatory Factor Analysis and Item Response Theory. Paper presented at NCME Annual Meeting, New Orleans, Louisiana, USA.
French, A. W., & Miller, T. R.(1996). Logistic regression and its use in detecting dierential item functioning in polytomous items. Journal of Educational Measurement, 33, 315-332.
French, B. F., & Maller, S. J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67, 373-393.
Gabriel, L.R., Stephen, S., & Oleksandr, S.C. (2008) The Effects of Referent Item Parameters on Differential Item Functioning Detection Using the Free Baseline Likelihood Ratio Test. Applied Psychological Measurement June, 33, 251-265.
Gallagher. A. M.. & Lisi. R. (1994). Gender differences in Scholastic Aptitude Test: Mathematics problem solving among high-ability students. Journal of Educational Psychology, 86(2), 204-211.
Gallagher, A. M., De Lisi, R., Holst, P. C., McGillicuddy-De Lisi, A. V., Morely, M., & Cahalan, C.(2000). Gender differences in advanced mathematical problem solving. Journal of Experimental Child Psychology, 75, 165-190.
Gierl, M. J. (2005). Using dimensionality-based DIF analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues & Practice. Vol 24(1), 3-14.
Gierl, M.J., Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement:Issues and Practice, 20, 26-36.
Gierl, M. J., & Bolt, D. (2003, April). Implications of the multidimensionality-based DIF analysis framework for selecting a matching and studied subtest. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Gierl, M. J., Bisanz, J., Bisanz, G., & Boughton, K. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the DIF analysis paradigm. Journal of Educational Measurement, 40, 281-306.
Gierl, M. J., Jodoin, M. G., & Ackerman, T. A. (2000). Performance of Mantel-Haenszel, Simultaneous Item Bias Test, and Logistic Regression When the Proportion of DIF Items is Large. Paper Presented at the Annual Meeting of the American Educational Research Association (AERA), New Orleans, Louisiana, USA.
Gierl, M. J., Rogers, W. T., & Klinger, D. (1999). Using statistical and judgmental reviews to identify and interpret translation DIF. Paper presented at the meeting of the National Council on Measurement in Education, Montreal, Canada.
Glöckner -Rist, A. & Hoijtink , H. (2003). The best of Both Worlds: Factore Analysis of Dichotomous data Using Item Response Theory and Structural Equation Modeling. Structural Equation Modeling, 10(4), 544-565.
Hair, J. F., Anderson, R. E., Tatham, R. L., & Black, W. C. (1992). Multivariate Data Analysis with Readings (3rd ed.). New York: Macmillan.
Haley, D. C. (1952). Estimation of the dosage mortality relationship when the dose is subject to error. Technical Report No. 15, Stanford, Calif: Stanford University, Applied Mathematics and Statistics Laboratory.
Hambleton, R. K. (1989). Principles and selected applications of item response theory. In R. L. Linn (Ed.), Educational Measurement, 147-200. New York: Macmillan.
Hambleton, R. K. & Rogers, H. (1988). Detecting Biased Test Items: Comparison of the IRT Area and Mantel - Haenszel Methods. (ERIC Document Reproduction Service N0. ED 300398).
Hambleton, R. K., & Swaminathan. H. (1985). Item response theory: Principles and applications (Ed.). Boston, MA: Kluwer-Nijhoff.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage.
Hamilton, L., Nussbaum, M., Kupermintz, H., Kerkhoven, J., & Snow, R. (1995). Enhancing the validity and usefulness of large-scale educational assessments: NELS:88 science achievement. American Educational Research Journal, 32, 555-581.
Han, K. T. (2007). IRTEQ: Windows application that implements IRT scaling and equating [computer program]. Applied Psychological Measurement, 33(6), 491-493.
Harris , A. M.,& Carlton, S. T. (1993). Patterns of gender differences on mathematics items on the SAT. Applied Measurement in Education, 6, 137-151.
Hattie, J. A. (1985). Methodological review: Assessing unidimensionality of tests and items. Applied Psychological Measurement, 9, 139-164.
Hayduk, L. A. (1987). Structural Equation Modeling with LISREL: Essentials and Advances. Baltimore: Johns Hopkins University Press.
Hidalgo-Montesinos, M. D., Gómez-Benito, J. (2003). Test Purification and the Evaluation of Differential Item Functioning with Multinomial Logistic Regression. European Journal of Psychological Assessment, 19 (1), 1-11.
Hoelter, J. W. (1983). The analysis of covariance structures: Goodness-of-fit indices. Sociological Methods and Research, 11, 325-344.
Holland, W. P. (1985). On the study of differential item performance without IRT. Proceedings of the Military Testing Association. I , 282-287.
Holland, P. W., & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test validity. 129-145. Hillsdale, NJ: Lawrence Erlbaum.
Hoyle, R. H., & Panter, A. T. (1995). Writing about structural equation models. In R. H. Hoyle (Ed.), Structural equation modeling.- Concepts, issues, and applications, 158-176. Thousand Oaks, CA: Sage.
Huang , C. D., Church,A. T., & Katigbak,M. S. (1997) . Identifying cultural differences in items and traits;Differential item functioning in the NEO personality inventory. Journal of Cross Cultural Psychology, 28, 192-218.
Huang, P.-R., Sun, G.-W., & Shih, C.-L. (2009). Assessing differential item functioning using Mantel-Haenszel method with DIF-free-then-DIF strategy. Pacific Rim Objective Measurement Symposium Hong Kong (PROMS HK 2009). July 28-30, 2009, Hong Kong.
Jackson. C.. & Braswell. J. (1992. April). An analysis of factors causing dfferential item functioning on SAT-Mathematics items. Paper presented at the Akimial Meeting of the American Research Association. San Francisco.
Jodoin, M. G., & Gierl, M. J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurment in Educational, 14, 329-349.
Jöreskog, K. G., & Moustaki, I. (2001). Factor analysis of ordinal variables: A comparison of three approaches. Multivariate Behavioral Research, 36, 347–387.
Jöreskog. K. G., & Sörbom, D. (1993). LISREL 8: Structural equation modeling with the SIMPLIS command language. Chicago: Scientific Software International.
Judd, C. M., & McClelland, G. H. (1989). Data analysis: A model comparison approach. San Diego, CA: Harcourt, Brace, Jovanovich.
Kamata, A. & Bauer, D. J. (2008). A note on the relationship between factor analytic and item response theory models. Structural Equation Modeling. 15, 136-153.
Kim, W. (2003). Development of a dfferential item functioning (DIF) procedure using the hierarchical generalized linear model: A comparison study with logistic regression procedure. Unpublished doctoral dissertation, University of Pennsylvania.
Kim, S. H., & Cohen, A. S. (1992). Effects of linking methods on detection of DIF. Journal of Educational Measurement, 29, 51-66.
Kim, S. H., & Cohen, A. S. (1998). Detection of differential item functioning under the graded response model with the likelihood ratio test. Applied Psychological Measurement, 22, 345–355.new window
Knol, D. L., & Berger, M. P. F. (1991). Empirical comparison between factor analysis and multidimensional item response models. Multivariate Behavioral Research, 26(3), 457-477.
Kupermintz, H., M. M. Ennis, L. S. Hamilton, J. E. Talbert, & R. E. Snow. (1995). Enhancing the Validity and Usefulness of Large-scale Educational Assessments: I. NELS:88 Mathematics Achievement. American Educational Research Journal , 32(3), 525-554.
Lane, S., Wang, N. & Magone, M. (1996). Gender-related differential item functioning on a middle-school mathematics performance assessment. Educational Measurement: Issues and Practice, 15(4), 121-127.
Li, H. & Stout, W. F. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61, 647-677.
Linacre, J. M. & Wright, B. D. (1997). A user’s guide to BIGSTEPS: Rasch model computer program. [Computer program]. Version 2.8. Chicago: MESA Press.
Linn, R. L., Levine, M. V., Hastings, C. N., & Wardrop, J. L. (1981). Item bias in a test of reading comprehension. Applied Psychological Measurement, 5, 159-173.new window
Lisa, S. (2002). An examination of measurement equivalence in survey administration methods. Unpublished doctoral dissertation, the Graduate College of the Illinois Institute of Technology, U.S.A.
Lord, F. M. (1952). A theory oftest scores. Psychometric Monograph, 7. New York:Psychometric Society.
Lord, F. M. (1980). Applications of item response theory to proctical testing problems. Hillsdale, NJ: Lawrence Erlbaum.
Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley.new window
MacIntosh, R. & Hashim, S.(2003). Converting MIMIC Model Parameters to IRT Parameters: A Comparison Of Variance Estimation Methods. Applied Psychological Measurement, 27, 372-379.
Mardia, K. V. (1970). Measures of multivariate skewness and kurtosis with applications. Biometrika, 57, 519–530.
Masters, G. N. (1982). A Rasch model for partial credit scoring. Psvchomethka. 47, 149-174.
Maurer, T. J., Raju, N. S., & Collins, W. C. (1998). Peer and subordinate performance measurement equivalence. Journal of Applied Psychology, 83, 693-702.
McClelland, G. H., & Judd, C. M. (1989). Data analysis: A model comparison approach. San Diego, CA: Harcourt Brace Jovanovich.
McDonald, R. P. (1967). Nonlinear factor analysis. Psychometric Monographs, No. 15.
McDonald, R. P. (1999). Test theory: A unified treatment. Mahwah, NJ: Erlbaum.
McDonald, R. P., & Ho, M. H. R. (2002). Principles and practices in reporting structural equation analysis. Psychological Methods, 7, 64-82.
Meade, A. W., & Lautenschlager, G. J. (2004). A Comparison of Item Response Theory and Confirmatory Factor Analytic Methodologies for Establishing Measurement Equivalence/Invariance. Organizational Research Methods, 7(4), 361-388.
Me11enbergh, G. J. (1982). Contingency table models for assessing item bias. Journal of Educational Statistics, 7(2), 105-118.
Mellenbergh, G.J. (1994). A unidimensional latent trait model for continuous item responses. Multivariate Behavioral Research, 29, 223-236.
Mehta, P. D., & Taylor, W. P. (2006, June). On the relationship between item response theory and factor analysis of ordinal variables: Multiple group case. Paper presented at the 71st annual meeting of the Psychometric Society, HEC Montreal, Canada.
Miller, M. D., & Oshima, T. C. (1992), Effect of sample size, number of biased items, and magnitude of bias on a two-stage item bias estimation method. , Applied Psychological Measurement, 16, 381-388.
Mislevy, R. J. (1986). Bayes modal estimation in item response models. Psychometrika, 51, 177-195.
Moustaki, I., Jöreskog, K. G., & Mavridis, D. (2004). Factor models for ordinal variables with covariate effects on the manifest and latent variables: A comparison of LISREL and IRT approaches. Structural Equation Modeling, 11, 487-513.
Mueller, R. O. (1996). Basic principles of structural equation modeling: An introduction to LISREL and EQS. New York: Springer.
Muthén, B. O. (1993). Goodness of fit with categorical and other nonnormal variables. In K. A. Bollen & J. S. Long (Eds.), Testing structural equation models,205–234. Newbury Park, CA: Sage.
Muthén, B., Kao, C. F., & Burstein, L. (1991). Instructional sensitivity in mathematics achievement test items: Applications of a new IRT-based detection technique. Journal of Educational Measurement, 28, 1-22.new window
Muthén, L. K., & Muthén, B. O. (2004). Mplus user’s guide, version 3. Los Angeles: Author.
Muthén, B., du Toit, S.H.C., & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Psychometrika. in press.
Nandakumar, R. (1993). Assessing essential unidimensionality of real data. Applied Psyhdaiml Measuremm, 17(1), 29- 38.
Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18, 315-328.
Narayanan, P., & Swaminathan, H. (1996). Indentification of items that show nonuniform DIF. Applied Psychological Measurement, 20, 257-274.
Navas-Ara, M. J., & Gomez-Benito, J. (2002). Effects of ability scale purification on the identification of DIF. European Jouranl of Psychological Assessment, 18, 9-15.
Nohoon, K., Davenport, E. C., & Davison, M. L. (1998, April). A comparative study of observed score approaches and purification procedures for detecting differential item functioning. Paper presented at the annual meeting of the National Council on Measurement in Education, San Diego, CA.
Norberto, V. (2003). An empirical comparison of measurement equivalence methods based on confirmatory factor analysis (with mean and covariance structures analysis) and item response theory. Unpublished doctoral dissertation, the Graduate College of the Illinois Institute of Technology, U.S.A.
OECD (2004). Learning for tomorrow’s world-first results from PISA 2003. Pairs: Author.
O’Neill, K. A., Wild, C. L., & McPeek, W. M. (1989). Gender-related differential item performance on graduate admissions tests. Paper presented at the annual meeting of the American Educational Research Association, San Francisco, CA.
O'Neill, K.A., & McPeek, W.M. (1993). Item and test characteristics that are associated with differential item functioning. In EW. Holland & H. Wainer (Eds.), Differential item functioning , 255-276. Hillsdale, NJ: Lawrence Erlbaum Associates.new window
Oort, F. J. (1998). Simulation study of item bias detection with, restricted factor analysis. Structural Equation modeling. 5, 107-124.
Oshima, T. C., & Miller, M. D. (1992). Multidimensionality and item bias in item response theory. Applied Psychological Measurement, 16, 237-248.
Oshima, T. C., Kushubar, S., Scott, J. C., & Raju, N. S. (2009). DFIT for Window User's Manual: Differential functioning of items and tests. St. Paul MN: Assessment Systems Corporation.
Oshima, T. C., Raju, N. S., Flowers, C. P., & Slinde, J. A. (1998). Differential bundle functioning using the DFIT framework: Procedures for identifying possible sources of differential functioning. Applied Measurement in Education, 11, 353-369.
Oshima, T. C., Raju, N. S., & Nanda, A. O. (2006). A new method for assessing the statistical significance in the differential functioning of items and tests (DFIT) framework. Journal of Educational Measurement, 43(1), 1-17.
Park, D. G., & Lautenschlager, G. J. (1990). Improving IRT item bias detection with iterative linking and ability scale purification. Applied Psychological Measurement, 14, 163-173.
Pedhazur, E. J., & Pedhazur, S. L. (1991). Measurement, Design, and Analysis: an Integrated Approach. Hillsdale, NJ: Lawrence Erlbaum Associates.
Raju, N. S. (1988). The area between two item characteristics curves. Psychometrika, 54, 495-502.
Raju, N.S. (1999). DFIT5P: A Fortran program for calculating DIF/DTF [Computer Program]. Chicago, Illinois Institute of Technology.
Raju, N. S. (1990). Determining the significance of estimated sign and unsigned areas between two item response functions. Applied Psychological Measurement, 14, 197-207.
Raju, N. S., Laffitte, L. J., & Byrne, B. M. (2002). Measurement equivalence: A comparison of methods based on confirmatory factor analysis and item response theory. Journal of Applied Psychology, 87(3), 517-529.
Raju, N. S., van der Linden, W. J., & Fleer, P. F. (1995). IRT-based internal measures of differential functioning of items and tests. Applied Psychological Measurement, 19(4), 353-368.
Ramsey, EA. (1993). Sensitivity review: the ETS experience as a case study. In EW. Holland & H. WaJner (Eds.). Differential item functioning, 367-388. Hillsdale, NJ: Lawrence Erlbaum Associates.
Rasch, G. (1960). Probabilistic models for some intelligence αnd attαinment tests. Chicago: University of Chicago Press.
Raykov, T., & Marcoulides, G. A. (2000). A Method for Comparing Completely Standardized Solutions in Multiple Groups. Structural Equation Modeling, 7(2), 292-308.
Reckase, M. D. (1979). Unifactor latent trait models applied to multifactor tests: Results and implication. Journal of Educational Statistics, 4, 207-230.
Reckase, M. D. (1996). Test construction in the 1990’s: Recent approaches every psychologist should know. Psychological Assessment, 8, 354-359.
Reise, S.P., & Yu, J. (1990). Parameter recovery in the graded response model using MULTILOG. Journal of Educational Measurement, 27, 133-144.
Reise, S. P., Widaman, K. F., & Pugh, R. H. (1993). Confirmatory factor analysis and item response theory: two approaches for exploring measurement invariance. Psychol Bull, 114(3), 552-566.
Rogers, H. J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105-116.
Roussos, L. A., & Stout, W. F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
Ryan. K. E.. & Fan. M. (1996). Examining gender DIF on a multiple-choice test of mathematics: A confirmatory approach. Educational measurement: Issues and practice, 15(4), 21-27.new window
Samejima, F. (1969). Estimation of latent ability using a response pattern of graded scores. Psychometrika Monograph, 17.
Schermelleh-Engel, K., Moosbrugger, H., & Muller, H. (2003). Evaluating the Fit of Structural Equation Models: Tests of Significance and Descriptive Goodness-of-Fit Measures. Methods of Psychological Research Online, 8(3), 23-74.
Schumacker, R. & Lomax, R. (1996). A Beginner’s Guide to Structural Equation Modeling. Mahwah,NJ:Lawrence Erlbaum Associates.
Shealy, R. T., & Stout, W. F. (1993). An item response theory model for test bias and differential test functioning. In P. W. Holland and H. Wainer (Eds.). Differential Item Functioning ,197-239. Hillsdale, NJ: Erlbaum.new window
Shepard, L.A., Camilli, G., & Williams, D. (1985). Validity of approximation techniques for detecting item bias. Journal of Educational Measurement, 22, 77-105.
Shih, C.-L. & Wang W.-C. (2005). Locating DIF-Free Items as Anchors Using the Iterative Constant-Item Method . The 14th International Meeting and the 70th Annual Meeting of the Psychometric Society, July 5-9, 2005, Tilburg, The Netherlands.
Sircci, S. G., & Berberoglu, U. (2000). Using bilingual respondents to evaluate translated-adapted items. Applied Measurement in Education, 13, 229-248.
Sireci, S.G., Fitzgerald, C.,& Xing, D. (1998). Adapting credentialing examinations for international uses. Laboratory of Psychometric and Evaluative Research Report 329. Amherst: University of Massachusetts, School of Education.
Sireci, S. G., Foster, D., Olsen, J. B., & Robin, F. (1997, March). Comparing dual-language versions of international computerized certification exams. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago, IL.
Sörbom, , D. (1982). Structural equation models with structured means. In K. G . Joreskog & H. Wold (Eds.), Systems under indirect observation , 183-195. Amsterdam: North Holland.
Stanley, F. & Chris, W. (2007, August). The Implications and Detection of Differential Item Functioning in Survey Analysis. Paper presented at the annual conference of the American Political Science Association.
Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: Toward a unified strategy. Journal of Applied Psychology, 91(6), 1292-1306.
Steenkamp, J. B., & Baumgartner, H. (1998). Assessing Measurement Invariance in Cross-National Consumer Research, Journal of Consumer Research, 25(1), 78-90.
Su, Y. H., & Wang, W. C. (2005). Efficiency of the Mantel, generalized Mantel-Haenszel, and logistic discriminant function analysis methods for detecting differential item functioning for polytomous items. Applied Measurement in Education, 18, 313-350.
Su, Y. H., Shih, C. L., & Wang, W. C.(2006). Locating DIF-Free Items to Serve as Anchors for Detection of Differential Item Functioning. Chinese Association of Psychological Testing Annual meeting in 2006. Taipei, Taiwan.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27, 361-370.
Tabachnick, B., & Fidell, L. (1996). Use of item response theory and the testlet concept in the measurement of psychopathology.Psychological Methods, 1(1), 81-97.
Takane, Y. & deLeeuw, J. (1987). On the relationship between item response theory and factor analysis of discretized variables. Psychometrika, 52, 393-408.
Thissen, D., Chen, W. H., & Bock, D. (2003). MULTILOG for Windows (Version 7.0) [computer program]. Lincolnwood, IL: Scientific Software International.
Thissen, D. & Steinberg, L. (1986). A taxonomy of item response models. Psychometrika, 51, 567-577.
Thissen, D., Steinberg, L. & Mooney, J. A. (1989). Trace lines for testlets: A use of multiple-categorical-response models. Journal of Educational Measurement, 26, 247-260.
Thissen, D. & Wainer, H. ( Eds ) (2001) . Test Scoring. Hilisdale. NJ Lawrence Eribaum Associate.
Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3(1), 4–69.
Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8, 157-186.new window
Wainer, H., & Lukhele, R. (1997). How reliable are TOEFL scores? Educational and Psychological Measurement, 57(5), 741-759
Wainer, H., Sireci, S. G., & Thissen, D. (1991). Differential testlet functioning: Definitions and detection. Journal of Educational Measurement, 28, 197-219.
Wang, W. C., & Su, Y. H. (2004). Factors Influencing the Mantel and generalized Mantel-Haenszel methods for the assessment of differential item functioning in polytomous items. Applied Psychological Measurement, 28, 450-480.
Wang,W. C.,& Yeh, Y. L. (2003). Effects ofanchor item methods on differential item functioning detection with the likelihood ratio test. AppliedPsychological Measurement, 27, 479-498.new window
Waternaux, C. M. (1976). Asymptotic distribution of the sample roots for a nonnormal population. Biometrika, 63, 639-645.
Weiss, D. J. & McBride, J. R. (1984). Bias and information of Bayesian adaptive testing. Applied Psychological Measurement, 8, 273-285.
West, S. G., Finch, J. F., & Curran, P. J. (1995). Structural equation models with nonnormal variables: Problems and remedies. In R. H. Hoyle (Ed.), Structural equation modeling: Concepts, issues, and applications, 56−75. Thousand Oaks, CA: Sage.
William, S. (1999). DIFPACK—Dimensionality-based DIF/DBF package:SIBTEST, POLY-SIBTEST, CROSSING SIBTEST,DIFSIM, DIFCOMP [Computer software].Urbana-Champaign, II.: Author.
Wilson, D.T., Wood, R.& Gibbons, R. (1984). TESTFACT. Chicago: Scientific Software, Inc.
Wirth, R. J. & Edwards, M. C. (2007). Item factor analysis: Current approaches and future directions. Psychological Methods, 12(1), 58-79.
Wright, B. D. & Linacre, J. M. (1997). User's Guide to BIGSTEPS: Rasch-Model Computer Program. Chicago: MESA Press.
Zhang, J. & Stout, W. (1999). The theoretical DETECT index of dimensionality and its application to approximate simple structure. Psychometrika, 64, 321–249.
Zieky, M. (1993). Practical questions in the use of DIF statistics in test development. In P. W. Holland & H. Wainer (Eds.). Differential item functioning ,337-347. Hillsdale, NJ: Erlbaum.new window
Zumbo, B. D. & Koh, K. H. (2005). Manifestation of Differences in Item-Level Characteristics in Scale-Level Measurement Invariance Tests of Multi-Group Confirmatory Factor Analyses. Journal of Modern Applied Statistical Methods, 4, 275-282.
Zwick, R.,Donoghue, J. R., & Grima, A. (1993). Assessment of differential item functioning for performance tasks. Journal of Educauonal Measurement. 30, 233-251.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關書籍
 
無相關著作
 
QR Code
QRCODE