:::

詳目顯示

回上一頁
題名:評分客觀性與能力估計客觀性:傳統作法與試題反應理論作法之比較
書刊名:測驗年刊
作者:王文中
作者(外文):Wang, Wen-chung
出版日期:1997
卷期:44:1
頁次:頁29-52
主題關鍵詞:Rasch模式評分者嚴苛度評分客觀性能力估計客觀性試題反應模式建構反應題Rasch modelRater severityObjectivity of ratingsObjectivity of ability estimatesItem response theoryConstructed-response items
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(5) 博士論文(1) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:5
  • 共同引用共同引用:20
  • 點閱點閱:59
     評分客觀性指的是考生實得分數與其應得分數差異的程度,能力估計客觀性則是 對考生能力估計不受試題難度或評分者嚴苛度的影響。傳統上,對於建構反應題的評分客觀 性多以評分的一致性為依據。例如積差相關、分數差距的比例、概化力理論等。事實上,如 果考生被兩位打分非嚴苛的評分者來改卷子,他們所給的分數可能非常一致的低,這顯然低 估了考生的能力。因此評分者的一致性並不能保證評分的客觀性。除此之外,傳統作法建立 在古典測驗理論上,因此對考生能力的估計也就受到試題難度的干擾,因此也喪失了對能力 估計的客觀性。 試題反應理論成功的克服了這個缺點,如果資料吻合模式的話,對考生能力的估計也就 不受試題難度的影響。評分者嚴苛度也試題難度的一種,因此即使評分者的嚴苛程度不一, 並不會妨礙我們對考生能力的估計。過去利用試題反應理論來分析評分者嚴苛度的作法,大 多限試題本身難度和評分者嚴苛度不可以有交互作用,本研究刪去這個限制,因而形成多種 嚴苛模式。這大幅提高模式的適用性,也增進能加估計的客觀性。本研究一方面評論傳統分 析的不當,同時也利用試題反應理論,建構出各種評分者嚴苛度的模式,並說明其意義。最 後,透過大學聯考生物科建構反應題的資料分析,比較傳統方法與試題反應理論方法的差異 。
     Objectivity of ratings is referred to as the degree of agreement between given scores and deserved scores. Objectivity of ability estimates is achieved if the estimates are free from the items tested. Traditionally, objectivity of ratings in usually assessed in terms of consistency between ratings, such as correlation and percentages of agreement. Suppose an examinee is judged by two severe raters, the scores given may be consistently low. Consequently,consistency of ratings does not necessarily imply objectivity of ratings. Moreover, these traditional approaches are based on classical test theory, which mixes up ability estimates and difficulty estimates. Therefore, objectivity of ability estimates is destroyed. Item response theory successfully overcomes this drawback. If data fit the model, ability estimates and difficulty estimates are mutually independent. Objectivity of ability estimates is thus possible. For constructed-response items, where raters are involved, item difficulties can be partitioned into genuine difficulties and rater severities. If data fit the model, the ability estimates are independent of raters, meaning that ability will be objectively estimated even if raters are in different degress of severity. However, in earlier works parameterization of rater severities is too rigid to fit complexity of testing situations. In this study, unnecessary constraints are released and several advanced models are proposed. In so doing, not only their applications but also objectivity of ability estimates is increased. A real data set from the biology subject of the 1995 Joint College Entrance Examination was analyzed to demonstrate advantages of item response modeling over traditional approaches.
期刊論文
1.Fischer, G. H.(1973)。The Linear Logistic Test Model as Instrument in Educational Research。Acta Psychologica,37,359-374。  new window
2.Engelhard, G. Jr.(1994)。Examining rater errors in the assessment of written composition with a many-faceted Rasch model。Journal of Educational Measurement,31,93-112。  new window
3.Engelhard, G. Jr.(1996)。Evaluating rater accuracy in performance assessments。Journal of Educational Measurement,33,56-70。  new window
4.王文中(19961200)。幾個有關Rasch測量模式的爭議。教育與心理研究,19,1-25。new window  延伸查詢new window
5.Engelhard, G. Jr.(1992)。The measurement of writing ability with a many-faceted Rasch model。Applied Measurement in Education,5(3),171-191。  new window
6.Lunz, M. E.、Wright, B. D.、Linacre, J. M.(1990)。Measuring the impact of judge severity on examination scores。Applied Measurement in Education,3(4),331-345。  new window
7.Lord, F. M.(1952)。The relationship of test score to trait underlying the test。Educational and Psychological Measurement,13,517-548。  new window
8.Andrich, D.(1978)。A Rating Formulation for Ordered Response Categories。Psychometrika,43(4),561-573。  new window
9.Wilson, M. R.(1992)。The ordered partition model: An extension of the partial credit model。Applied Psychological Measurement,16,309-325。  new window
10.Abedi, J.、Baker, E. L.(1995)。A latent-variable modeling approach to assessing interrater reliability, topic generalizability, and validity of a content assessment scoring rubric。Educational and Psychological Measurement,55,701-715。  new window
11.Lavingueur, S.、Tremblay, R. E.、Saucier, J. F.(1993)。Can spouse support be accurately and reliably rated? A generalizability study of families with disruptive boys。Journal of Child Psychology and Psychiatry and Allied Disciplines,34,689-714。  new window
12.Longford, N. T.(1994)。Reliability of essay rating and score adjustment。Journal of Educational and Behavioral Statistics,19(3),171-200。  new window
13.Lunz, M. E.、Stahl, J. A.(1990)。The effect of rater severity on person ability measure: A Rasch model analysis。American Journal of Occupational Therapy,47,311-317。  new window
14.Marcoulides, G. A.(1994)。Selecting weighting schemes in multivariate generalizability studies。Educational and Psychological Measurement,54,3-7。  new window
15.McWilliam, R. A.、Ware, W. B.(1994)。The reliability of observations of young children's engagement: An application of generalizability theory。Journal of Early Intervention,18,34-47。  new window
16.Rost, J.(1988)。Measuring attitudes with a threshold model drawing on a traditional scaling concept。Applied Psychological Measurement,12,397-409。  new window
17.王文中(1993)。以項目反應理論來探討評分者的評分標準與嚴苛度。教育與心理研究,16,83-106。  延伸查詢new window
18.Bock, R. D.、Aitkin, M.(1981)。Marginal maximum likelihood estimation of item parameters: application of an EM algorithm。Psychometrika,46(4),443-459。  new window
19.Masters, G. N.(1982)。A Rasch model for partial credit scoring。Psychometrika,47(2),149-174。  new window
20.Adams, Raymond J.、Wilson, Mark R.、Wang, Wen-chung(1997)。The multidimensional random coefficients multinomial logit model。Applied Psychological Measurement,21(1),1-23。  new window
21.Wilson, M. R.、王文中(1995)。Complex composites: Issues that arise in combining different modes of assessment。Applied Psychological Measurement,19,51-72。  new window
會議論文
1.Lunz, M. E.、Wright, B. D.、Stahl, J. A.、Linacre, J. M.(1989)。Equating practical examinations。the annual meeting of the National Council on Measurement in Education。San Francisco, CA。  new window
2.Lunz, M. E.、Stahl, J. A.(1990)。Severity of grading across time periods。Annual Meeting of the American Educational Research Association。Boston。  new window
學位論文
1.Wang, W.(1994)。Implementation and application of the multidimensional random coefficients logit model(博士論文)。University of California,Berkeley。  new window
圖書
1.Hambleton, R. K.、Swaminathan, H.、Rogers, H. J.(1991)。Fundamentals of item response theory。Newbury Park, California:Sage Publications。  new window
2.Linacre, J. M.(1989)。Many-faceted Rasch measurement。Chicago, IL:MESA Press。  new window
3.王寶墉(1993)。現代測驗理論。台北市:心理出版社。  延伸查詢new window
4.Rasch, G.(1980)。Probabilistic models for some intelligent and attainment tests。Copenhagen:Danmarks Paedogogiske Institut。  new window
5.許擇基、劉長萱(1991)。試題作答理論簡介。  延伸查詢new window
6.Wright, B. D.、Masters, G. N.(1982)。Rating Scale Analysis。Chicago, IL:MESA Press。  new window
7.Lord, Frederic M.(1980)。Applications of Item Response Theory to Practical Testing Problems。Lawrence Erlbaum Associates, Inc.。  new window
圖書論文
1.Adams, R. J.、Wilson, M. R.(1996)。Formulating the Rasch model as a mixed coefficients multinomial logit。Objective measurement: Theory into practice。Norwood, NJ:Ablex。  new window
2.Wang, W. C.、Wilson, M. R.、Adams, R. J.(1997)。Rasch models for multidimensionality between items and within items。Objective measurement: Theory into practice。Norwood, NJ:Ablex。  new window
3.余民寧(1993)。測驗理論的發展趨勢。心理測驗的發展與應用。臺北市:心理出版社。  延伸查詢new window
4.Akaike, H.(1977)。On entropy maximization principle。Applications of statistics。New York:North Holland。  new window
5.Birnbaum, A. L.(1968)。Some latent trait models and their use in inferring an examinee's ability。Statistical theories of mental test scores。Addison-Wesley Publishing Company。  new window
6.王文中、Wilson, M. R.(1996)。Comparing multiple-choice-items and performance-based items using item response modeling。Objective measurement: Theory into practice。Norwood, NJ:Ablex。  new window
 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
QR Code
QRCODE