| 期刊論文1. | Wolfe, Edward W.(2004)。Identifying rater effects using latent trait models。Psychology Science,46(1),35-51。 | 2. | Smith, R. M.(2000)。Fit analysis in latent trait measurement models。Journal of Applied Measurement,1(2),199-218。 | 3. | Longford, N. T.(1994)。Reliability of essay rating and score adjustment。Journal of Educational and Behavioral Statistics,19(3),171-200。 | 4. | Raymond, M. R.、Viswesvaran, C.(1993)。Least squares models to correct for rater effects in performance assessment。Journal of Educational Measurement,30,253-268。 | 5. | Kuo, C.-Y.、Wu, H.-K.、Jen, T.-H.、Hsu, Y.-S.(2015)。Development and validation of a multimedia-based assessment of scientific inquiry abilities。International Journal of Science Education,37(14),2326-2357。 | 6. | 吳慧珉、郭伯臣、許天維、陳婉寧(20150600)。以可能值方法為基礎之多向度能力值垂直等化探究。測驗學刊,62(2),95-126。 延伸查詢 | 7. | 謝名娟(20130600)。以多層面Rasch分析的角度來評估標準設定之變異性。教育心理學報,44(4),793-811。 延伸查詢 | 8. | Aryadoust, V.(2015)。Self- and peer-assessments of the oral presentations of first-year science students。Educational Assessment,20(3),199-225。 | 9. | Elder, C.、Knoch, U.、Barkhuizen, G.、von Randow, J.(2005)。Individual Feedback to Enhance Rater Training: Does It Work?。Language Assessment Quarterly,2(3),175-196。 | 10. | Harasym, P. H.、Woloschuk, W.、Cunning, L.(2008)。Undesired variance due to examiner stringency/leniency effect in communication skill scores assessed in OSCEs。Advances in Health Sciences Education: Theory and Practice,13(5),617-632。 | 11. | Smith, E. V.(2002)。Detecting and evaluating the impact of multidimensionality using item fit statistics and principal component analysis of residuals。Journal of Applied Measurement,3,205-231。 | 12. | 林小慧、曾玉村(20171200)。科學多重文本閱讀理解評量之建構與信效度分析--以氣候變遷與三峽大壩之間的關係題本為例。教育心理學報,49(2),215-241。 延伸查詢 | 13. | Park, Taejoon(2004)。An Investigation of an ESL Placement Test of Writing Using Many-facet Rasch Measurement。Teachers College, Columbia University Working Papers in TESOL & Applied Linguistics,4(1),1-21。 | 14. | Raymond, M. R.、Webb, L. C.、Houston, W. M.(1991)。Correcting Performance-rating errors in oral examinations。Evaluation and the Health Professions,14(1),100-122。 | 15. | 林小慧、林世華、吳心楷(20180300)。科學能力的建構反應評量之發展與信效度分析:以自然科光學為例。教育科學研究期刊,63(1),173-205。 延伸查詢 | 16. | 王暄博、郭伯臣、呂玉如(20130900)。大型測驗等化群體不變性之探究:以2007年臺灣學生學習成就評量資料庫國中二年級數學科為例。測驗學刊,60(3),489-518。 延伸查詢 | 17. | 謝名娟(20170600)。誰是好的演講者?以多層面Rasch來分析校長三分鐘即席演講的能力。教育心理學報,48(4),551-566。 延伸查詢 | 18. | Braun, H. I.(1988)。Understanding scoring reliability: Experiments in calibrating essay readers。Journal of Educational Statistics,13,1-18。 | 19. | Engelhard, G. Jr.(1997)。Constructing rater and task banks for performance assessments。Journal of Outcome Measurement,1,19-33。 | 20. | Lance, C. E.、LaPointe, J. A.、Stewart, A. M.(1994)。A test of the context dependency of three causal models of halo rater error。Journal of Applied Psychology,79,332-340。 | 21. | Linacre, J. M.(1998)。Structure in Rasch residuals: Why principal components analysis?。Rasch Measurement Transactions,12(2)。 | 22. | Linacre, J. M.(2010)。Predicting responses from Rasch measures。Journal of Applied Measurement,11(1),1-10。 | 23. | Lunz, M.、Suanthong, S.(2011)。Equating of multi-facet tests across administrations。Journal of Applied Measurement,12,124-134。 | 24. | Palermo, C.、Bunch, M. B.、Ridge, K.(2019)。Scoring stability in a large-scale assessment program: A longitudinal analysis of leniency/severity effects。Journal of Educational Measurement,56(3),626-652。 | 25. | Tseng, W.-T.、Su, T.-Y.、Nix, J.-M. L.(2019)。Validating translation test items via the many-facet Rasch model。Psychological Reports,122(2),748-772。 | 26. | Wilson, H. G.(1988)。Parameter estimation for peer grading under incomplete design。Educational and Psychological Measurement,48,69-81。 | 27. | Wind, S. A.、Engelhard, G. J.、Wesolowski, B.(2016)。Exploring the effects of rater linking designs and rater fit on achievement estimates within the context of music performance assessments。Educational Assessment,21(4),278-299。 | 28. | 趙子揚、黃嘉莉、宋曜廷、郭蕙寧、許明輝(20160600)。教師情境判斷測驗之編製。教育科學研究期刊,61(2),85-117。 延伸查詢 | 29. | 張新立、吳舜丞(20080400)。多層面Rasch模式於學術研討會論文評分之應用。測驗學刊,55(1),105-128。 延伸查詢 | 30. | 謝如山、謝名娟(20130900)。多層面Rasch模式在數學實作評量的應用。教育心理學報,45(1),1-16+18。 延伸查詢 | 31. | 姚漢禱、姚偉哲(20080400)。Application of Many-Facet Rasch Model to Analyze the Skills of Elite Athletes in Double Trap。測驗學刊,55(1),89-103。 | 32. | Tennant, A.、Pallant, J. F.(2006)。Unidimensionality matters! (A tale of two Smiths?)。Rasch Measurement Transactions,20(1),1048-1051。 | 會議論文1. | Breton, G.、Lepage, S.、North, B.(2008)。Cross-language benchmarking seminar to calibrate examples of spoken production in English, French, German, Italian and Spanish with regard to the six levels of the Common European Framework of Reference for Languages (CEFR)。The Séminaire Interlangues Cross Language Benchmarking Seminar,(會議日期: June 23-25, 2008)。Sévres:CIEP。 | 2. | Campbell, E. H.(1993)。Fifteen raters rating: An analysis of selected conversation during a placement rating session。The 44th Annual Meeting of the Conference on College Composition and Communication,(會議日期: March 31-April 3, 1993)。San Diego, CA。 | 研究報告1. | 林信志、謝名娟(2016)。中小學候用校長培訓混成課程模式與情境評量之發展與研究 (計畫編號:105-2410-H-656-002-MY2)。 延伸查詢 | 2. | Longford, N. T.(1993)。Reliability of essay rating and score adjustment。 | 3. | Myford, C. M.、Mislevy, R. J.(1995)。Monitoring and improving a portfolio assessment system。 | 圖書1. | Linacre, J. M.(1994)。Many-Facet Rasch Measurement。Chicago:MESA Press。 | 2. | North, B.(2000)。The Development of a Common Framework Scale of Language Proficiency。Peter Lang。 | 3. | Council of Europe(2001)。Common European Framework of Reference for Languages: Learning, teaching, assessment。Cambridge University Press。 | 4. | Bond, T. G.、Fox, C. M.(2015)。Applying the Rasch model: Fundamental measurement in the human sciences。London:Erlbaum。 | 5. | Eckes, Thomas(2011)。Introduction to many-facet Rasch measurement: Analyzing and evaluating rater-mediated assessments。Peter Lang。 | 6. | Engelhard, G. Jr.(2012)。Invariant measurement: Using Rasch models in the social, behavioral, and health sciences。New York, NY:Routledge。 | 7. | Johnson, R. L.、Penny, J. A.、Gordon, B.(2009)。Assessing performance: Designing, scoring, and validating performance tasks。Guilford Press。 | 8. | Kolen, M. J.、Brennan, R. L.(2014)。Test equating, scaling, and linking: Methods and practices。Springer。 | 9. | Linacre, J. M.(2005)。A User's Guide to WINSTEPS/MINISTEP Rasch-Model Computer Programs。MESA Press。 | 10. | Linacre, J. M.(1989)。Many-facet Rasch measurement。Chicago:MESA Press。 | 11. | McNamara, Timothy Francis(1996)。Measuring second language performance。London:Longman。 | 12. | Mislevy, R. J.(1992)。Linking educational assessments: Concepts, issues, methods, and, prospects。Princeton, NJ:Educational Testing Service, Policy Information Center。 | 其他1. | 大考中心(2018)。107學年度指考非選擇題評分標準說明,https://www.ceec.edu.tw/xcepaper/cont?xsmsid=0J066588036013658199&sid=0J149511655048005583。 延伸查詢 | 2. | North, B.,Jones, N.(2009)。Further material on maintaining standards across languages, contexts and administrations by exploiting teacher judgment and IRT scaling,https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680459fa0。 | 圖書論文1. | Stemler, S. E.、Tsai, J.(2008)。Best practices in interrater reliability: Three common approaches。Best practices in quantitative methods。Sage。 | 2. | O'Neill, T. R.、Lunz, M. E.(2000)。A method to study rater severity across several administrations。Objective measurement: Theory into practice。Standford, CT:Ablex。 | 3. | Wolfe, E. W.、Dobria, L.(2008)。Applications of the multifaceted Rasch model。Best practices in quantitative methods。Los Angeles:Sage。 | |