評分客觀性與能力估計客觀性:傳統作法與試題反應理論作法之比較_

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

來源文獻資料
摘要
外文摘要
引文資料

題名：	評分客觀性與能力估計客觀性:傳統作法與試題反應理論作法之比較
書刊名：	測驗年刊
作者：	王文中
作者(外文)：	Wang, Wen-chung
出版日期：	1997
卷期：	44:1
頁次：	頁29-52
主題關鍵詞：	Rasch模式；評分者嚴苛度；評分客觀性；能力估計客觀性；試題反應模式；建構反應題；Rasch model；Rater severity；Objectivity of ratings；Objectivity of ability estimates；Item response theory；Constructed-response items
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(5) 博士論文(1) 專書(0) 專書論文(0) 排除自我引用:5 共同引用:20 點閱:60

　　　　　評分客觀性指的是考生實得分數與其應得分數差異的程度，能力估計客觀性則是對考生能力估計不受試題難度或評分者嚴苛度的影響。傳統上，對於建構反應題的評分客觀性多以評分的一致性為依據。例如積差相關、分數差距的比例、概化力理論等。事實上，如果考生被兩位打分非嚴苛的評分者來改卷子，他們所給的分數可能非常一致的低，這顯然低估了考生的能力。因此評分者的一致性並不能保證評分的客觀性。除此之外，傳統作法建立在古典測驗理論上，因此對考生能力的估計也就受到試題難度的干擾，因此也喪失了對能力估計的客觀性。試題反應理論成功的克服了這個缺點，如果資料吻合模式的話，對考生能力的估計也就不受試題難度的影響。評分者嚴苛度也試題難度的一種，因此即使評分者的嚴苛程度不一，並不會妨礙我們對考生能力的估計。過去利用試題反應理論來分析評分者嚴苛度的作法，大多限試題本身難度和評分者嚴苛度不可以有交互作用，本研究刪去這個限制，因而形成多種嚴苛模式。這大幅提高模式的適用性，也增進能加估計的客觀性。本研究一方面評論傳統分析的不當，同時也利用試題反應理論，建構出各種評分者嚴苛度的模式，並說明其意義。最後，透過大學聯考生物科建構反應題的資料分析，比較傳統方法與試題反應理論方法的差異。

以文找文

　　　　　Objectivity of ratings is referred to as the degree of agreement between given scores and deserved scores. Objectivity of ability estimates is achieved if the estimates are free from the items tested. Traditionally, objectivity of ratings in usually assessed in terms of consistency between ratings, such as correlation and percentages of agreement. Suppose an examinee is judged by two severe raters, the scores given may be consistently low. Consequently,consistency of ratings does not necessarily imply objectivity of ratings. Moreover, these traditional approaches are based on classical test theory, which mixes up ability estimates and difficulty estimates. Therefore, objectivity of ability estimates is destroyed. Item response theory successfully overcomes this drawback. If data fit the model, ability estimates and difficulty estimates are mutually independent. Objectivity of ability estimates is thus possible. For constructed-response items, where raters are involved, item difficulties can be partitioned into genuine difficulties and rater severities. If data fit the model, the ability estimates are independent of raters, meaning that ability will be objectively estimated even if raters are in different degress of severity. However, in earlier works parameterization of rater severities is too rigid to fit complexity of testing situations. In this study, unnecessary constraints are released and several advanced models are proposed. In so doing, not only their applications but also objectivity of ability estimates is increased. A real data set from the biology subject of the 1995 Joint College Entrance Examination was analyzed to demonstrate advantages of item response modeling over traditional approaches.

以文找文

期刊論文
1.	Fischer, G. H.(1973)。The Linear Logistic Test Model as Instrument in Educational Research。Acta Psychologica，37，359-374。
2.	Engelhard, G. Jr.(1994)。Examining rater errors in the assessment of written composition with a many-faceted Rasch model。Journal of Educational Measurement，31，93-112。
3.	Engelhard, G. Jr.(1996)。Evaluating rater accuracy in performance assessments。Journal of Educational Measurement，33，56-70。
4.	王文中(19961200)。幾個有關Rasch測量模式的爭議。教育與心理研究，19，1-25。延伸查詢
5.	Engelhard, G. Jr.(1992)。The measurement of writing ability with a many-faceted Rasch model。Applied Measurement in Education，5(3)，171-191。
6.	Lunz, M. E.、Wright, B. D.、Linacre, J. M.(1990)。Measuring the impact of judge severity on examination scores。Applied Measurement in Education，3(4)，331-345。
7.	Lord, F. M.(1952)。The relationship of test score to trait underlying the test。Educational and Psychological Measurement，13，517-548。
8.	Andrich, D.(1978)。A Rating Formulation for Ordered Response Categories。Psychometrika，43(4)，561-573。
9.	Wilson, M. R.(1992)。The ordered partition model: An extension of the partial credit model。Applied Psychological Measurement，16，309-325。
10.	Abedi, J.、Baker, E. L.(1995)。A latent-variable modeling approach to assessing interrater reliability, topic generalizability, and validity of a content assessment scoring rubric。Educational and Psychological Measurement，55，701-715。
11.	Lavingueur, S.、Tremblay, R. E.、Saucier, J. F.(1993)。Can spouse support be accurately and reliably rated? A generalizability study of families with disruptive boys。Journal of Child Psychology and Psychiatry and Allied Disciplines，34，689-714。
12.	Longford, N. T.(1994)。Reliability of essay rating and score adjustment。Journal of Educational and Behavioral Statistics，19(3)，171-200。
13.	Lunz, M. E.、Stahl, J. A.(1990)。The effect of rater severity on person ability measure: A Rasch model analysis。American Journal of Occupational Therapy，47，311-317。
14.	Marcoulides, G. A.(1994)。Selecting weighting schemes in multivariate generalizability studies。Educational and Psychological Measurement，54，3-7。
15.	McWilliam, R. A.、Ware, W. B.(1994)。The reliability of observations of young children's engagement: An application of generalizability theory。Journal of Early Intervention，18，34-47。
16.	Rost, J.(1988)。Measuring attitudes with a threshold model drawing on a traditional scaling concept。Applied Psychological Measurement，12，397-409。
17.	王文中(1993)。以項目反應理論來探討評分者的評分標準與嚴苛度。教育與心理研究，16，83-106。延伸查詢
18.	Bock, R. D.、Aitkin, M.(1981)。Marginal maximum likelihood estimation of item parameters: application of an EM algorithm。Psychometrika，46(4)，443-459。
19.	Masters, G. N.(1982)。A Rasch model for partial credit scoring。Psychometrika，47(2)，149-174。
20.	Adams, Raymond J.、Wilson, Mark R.、Wang, Wen-chung(1997)。The multidimensional random coefficients multinomial logit model。Applied Psychological Measurement，21(1)，1-23。
21.	Wilson, M. R.、王文中(1995)。Complex composites: Issues that arise in combining different modes of assessment。Applied Psychological Measurement，19，51-72。

會議論文
1.	Lunz, M. E.、Wright, B. D.、Stahl, J. A.、Linacre, J. M.(1989)。Equating practical examinations。the annual meeting of the National Council on Measurement in Education。San Francisco, CA。
2.	Lunz, M. E.、Stahl, J. A.(1990)。Severity of grading across time periods。Annual Meeting of the American Educational Research Association。Boston。

學位論文
1.	Wang, W.(1994)。Implementation and application of the multidimensional random coefficients logit model(博士論文)。University of California，Berkeley。

圖書
1.	Hambleton, R. K.、Swaminathan, H.、Rogers, H. J.(1991)。Fundamentals of item response theory。Newbury Park, California：Sage Publications。
2.	Linacre, J. M.(1989)。Many-faceted Rasch measurement。Chicago, IL：MESA Press。
3.	王寶墉(1993)。現代測驗理論。台北市：心理出版社。延伸查詢
4.	Rasch, G.(1980)。Probabilistic models for some intelligent and attainment tests。Copenhagen：Danmarks Paedogogiske Institut。
5.	許擇基、劉長萱(1991)。試題作答理論簡介。延伸查詢
6.	Wright, B. D.、Masters, G. N.(1982)。Rating Scale Analysis。Chicago, IL：MESA Press。
7.	Lord, Frederic M.(1980)。Applications of Item Response Theory to Practical Testing Problems。Lawrence Erlbaum Associates, Inc.。

圖書論文
1.	Adams, R. J.、Wilson, M. R.(1996)。Formulating the Rasch model as a mixed coefficients multinomial logit。Objective measurement: Theory into practice。Norwood, NJ：Ablex。
2.	Wang, W. C.、Wilson, M. R.、Adams, R. J.(1997)。Rasch models for multidimensionality between items and within items。Objective measurement: Theory into practice。Norwood, NJ：Ablex。
3.	余民寧(1993)。測驗理論的發展趨勢。心理測驗的發展與應用。臺北市：心理出版社。延伸查詢
4.	Akaike, H.(1977)。On entropy maximization principle。Applications of statistics。New York：North Holland。
5.	Birnbaum, A. L.(1968)。Some latent trait models and their use in inferring an examinee's ability。Statistical theories of mental test scores。Addison-Wesley Publishing Company。
6.	王文中、Wilson, M. R.(1996)。Comparing multiple-choice-items and performance-based items using item response modeling。Objective measurement: Theory into practice。Norwood, NJ：Ablex。

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	利用Rasch模式評估高中物理素養試卷過程：教師發展素養試題之啟示
2.	以PIRLS架構和Rasch模式建構閱讀理解測驗
3.	Rasch模式建置國小高年級閱讀理解測驗
4.	羅許分析在編製運動賽會服務品質量表之應用
5.	Rasch多向度模式檢核「國小數學問題解決態度量表」(MPSAS)之心理計量特性
6.	創意產品共識評量之信、效度析論
7.	急診小兒科診斷群即時偵測系統--2000至2006年全國抽樣資料分析
8.	資訊模組提供健檢訊息之應用研究
9.	Rasch分析選拔醫院健保年度MVP醫師之應用研究
10.	提供醫師訊息對醫療費用及住院日數影響的研究--利用試題反應理論為訊息提供的程式基礎
11.	Rasch測量計算DRG相對權數的應用研究
12.	青少年約會衝突因應策略量表之發展
13.	RASCH評分量尺模式和二分模式在國小運動技能測驗上的比較
14.	曝光率控制對多向度電腦化適性測驗能力估計信度之影響--以2001年國中基本學力測驗資料為例
15.	創造力發展量表之編製與試題反應分析

1.	以多向度Rasch模式編製高中體育教師評鑑量表
2.	服務藍圖與FMEA應用於安全管理之研究
3.	國小四年級學生數常識反應類型共同性錯誤分析與迷思概念成因之探究
4.	Fuzzy Partial Credit Scaling: Applying Fuzzy Set Theory to Scoring Rating Scales

無相關書籍

無相關著作

無相關點閱

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫