大型測驗等化群體不變性之探究：以2007年臺灣學生學習成就評量資料庫國中二年級數學科為例_

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

來源文獻資料
摘要
外文摘要
引文資料

題名：	大型測驗等化群體不變性之探究：以2007年臺灣學生學習成就評量資料庫國中二年級數學科為例
書刊名：	測驗學刊
作者：	王暄博／郭伯臣／呂玉如
作者(外文)：	Wang, Hsuan-po／Kuo, Bor-chen／Lu, Yu-ju
出版日期：	2013
卷期：	60:3
頁次：	頁489-518
主題關鍵詞：	IRT真實分數等化；IRT觀察分數等化；群體不變性；量尺轉換方法；IRT observed score equating；IRT true score equating；Population invariance；Scale transformation method
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(2) 博士論文(0) 專書(0) 專書論文(0) 排除自我引用:2 共同引用:9 點閱:29

本研究以2007年「臺灣學生學習成就評量資料庫」（TASA）國中二年級數學科的測驗資料為例，檢驗TASA測驗進行量尺程序後，其測驗分數是否有符合等化群體不變性之性質。本研究以性別進行分群，探討不同等化方法於性別受試者群體中是否保留群體不變性，包含：平均數與標準差法、平均數法、試題特徵曲線，以及測驗特徵曲線等不同量尺轉換方法，並搭配試題反應理論（IRT）真實分數與IRT觀察分數等化方法，共計八種等化方法。此外，採用Dorans與Holland（2000）提出之均方根誤差（RMSD）與均方根平均期望誤差（REMSD），以及Yang（2004）提出之均方根期望誤差（RESD）等三種方法來評估經過次群體等化後的群體不變性，並以SDTM為評估準則。研究結果顯示，TASA 2007年的數學科資料除了題本七有某些分數點超出SDTM標準值之外，其餘題本皆符合等化群體不變性。

以文找文

This study aims to use test data from the Taiwan Assessment of Student Achievement (TASA) database to explore whether the test scores determined by the TASA complied with population invariance. Researchers used the TASA eighth grade mathematics data from 2007 and explored eight different equating methods to assess whether invariance was retained regarding the subjects' gender, including item response theory (IRT) true score and IRT observed score equating. This study also adopted four scale transformation methods, such as mean/mean, mean/sigma, Haebara, and Stocking-Lord procedures. Furthermore, Dorans and Hollands' (2000) RMSD and REMSD methods, as well as Yang's (2004) RESD method, were used to evaluate the population invariance after completed subpopulation equating. SDTM was the evaluation standard. The results showed that the TASA mathematics data correlated with the population invariance, except for the seventh booklet where a few points exceeded the SDTM standard.

以文找文

期刊論文
1.	Dorans, N. J.、Liu, J.、Hammond, S.(2008)。Anchor test type and population invariance: An exploration across subpopulations and test administrations。Applied Psychological Measurement，32(1)，81-97。
2.	Liu, M.、Holland, P. W,(2008)。Exploring population sensitivity of linking functions across three law school admission test administrations。Applied Psychological Measurement，32，27-44。
3.	Yang, W. L.(2004)。Sensitivity of linkings between AP multiple-choice scores and composite scores to geographical region: An illustration of checking for population invariance。Journal of Educational Measurement，41，33-41。
4.	Yang, W.-L.、Gao, R.(2008)。Invariance of score linkings across gender groups for forms of a testlet-based college-level examination program examination。Applied Psychological Measurement，32，45-61。
5.	Brennan, R. L.、Kolen, M. J.(1987)。Some practical issues in equating。Applied Psychological Measurement，11，279-290。
6.	Cook, L. L.、Petersen, N. S.(1987)。Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances。Applied Psychological Measurement: Issues and Practice，10，37-45。
7.	Dorans, N. J.、Holland, P. W.(2000)。Population invariance and equatability of tests: Basic theory and the linear case。Journal of Educational Measurement，37，281-306。
8.	Harris, D. J.、Crouse, J. D.(1993)。A study of criteria used in equating。Applied Measurernent in Education，6，195-240。
9.	Lord, F. M.、Wingersky, M. S.(1984)。Comparing IRT true-score and equipercentile observed score "equatings"。Applied Psychological Measurement，8，452-461。
10.	Loyd, B. H.、Hoover, H. D.(1980)。Vertical equating using the Rasch model。Journal of Educational Measurement，4，11-22。
11.	Petersen, Nancy S.、Cook, Linda L.、Stocking, Martha L.(1983)。IRT versus conventional equating methods: A comparative study of scale stability。Journal of Educational Statistics，8(2)，135-156。
12.	von Davier, A. A.、Wilson, C.(2008)。Investigating the population sensitivity assumption of item response theory true-score equating across two subgroups of examinces and two test formats。Applied Psychological Measurement，32，11-26。
13.	Yi, Q.、Harris, D. J.、Gao, X.(2008)。Invariance of equating functions across different subgroups of exarninees taking a Science Achievement。Test, Applied Psychological Measurement，32，62-80。
14.	Skaggs, G.、Lissitz, R. W.(1986)。IRT test equating: Relevant issues and a review of recent research。Review of Educational Research，56(4)，495-529。
15.	Haebara, T.(1980)。Equating logistic ability scales by a weighted least squares method。Japanese Psychological Research，22(3)，144-149。
16.	Hanson, B. A.、Béguin, A. A.(2002)。Obtaining a Common Scale for Item Response Theory Item Parameters Using Separate versus Concurrent Estimation in the Common-item Equating Design。Applied Psychological Measurement，26(1)，3-24。
17.	Marco, G. L.(1977)。Item characteristic curve solutions to three intractable testing problems。Journal of Educational Measurement，14，139-160。
18.	Stocking, M. L.、Lord, F. M.(1983)。Developing a common metric in item response theory。Applied Psychological Measurement，7(2)，201-210。
19.	郭伯臣、王暄博(20081200)。大型測驗中同時進行垂直與水平等化效果之探討。教育研究與發展期刊，4(4)，87-119。延伸查詢

會議論文
1.	Dorans, N. J.、Holland, P. W.、Thayer, D. T.、Tateneni, K.(2002)。Invariance of score linking across gender groups for three Advanced Placement Program exams, Paper presented at the annual meeting。the annual meeting of the National Council on Measurement in Education。New Orleans, LA。
2.	Harris, D. J.(1993)。Practical issues in equating。the annual meeting of the American Educational Research Association, Atlanta, GA。Atlanta, GA。
3.	Marco, G.、Petersen, N.、Stewart, E.(1979)。A test of the adequacy of curvilinear score equating models。The Computerized Adaptive Testing Conference。Minneapolis, MN。
4.	Skaggs, G.(1990)。Assessing the utility of item response theory models for testing equating。The annual meeting of the National Council on Measurement in Education, Boston, MA,。Boston, MA。
5.	Yang, W.-L.(2002)。Sample selection effect on AP multiplechoice score to composite score scaling, Paper presented at the annual meeting。The annual meeting of the National Council on Measurement in Education, New Orleans, LA,。New Orleans, LA。

研究報告
1.	(2010)。99學年度國中學生、教職員統計。延伸查詢

圖書
1.	Braun, H. L(1982)。Observed score test equating: A mathematical analysis of some ETS equating procedures。Test equating。New York, NY：Academic Press。
2.	Gullikson, H.(1950)。Theory of mental tests。New York：John Wiley & Sons：Wiley。
3.	Zimowski, M. F.、Muraki, E.、Mislevy, R. J.、Bock, R. D.(2003)。BILOG-MG: Multiple-group IRT analysis and test maintence for binary for binary items。Mooresvilk IL：Scientific Software。
4.	Lord, F. M.(1980)。Application of item response theoty to practical testing problems。Hitlsdale, NJ：Lawrence Eribaum Associates。
5.	Petersen, N. S.、Marco, G. L.、Stewart, E. B.(1982)。A test of the adequacy of linear score equating models。Testing equating。New York, NY：Academic Press。
6.	Crocker, L.、Algina, J.(1986)。Introduction to Classical and Modern Test Theory。Holt, Rinehart & Winston。
7.	Kolen, M. J.、Brennan, R. L.(2004)。Test equating, scaling, and linking: Methods and practices。New York, NY：Springer Science+Business Media：Springer-Verlag。
8.	Hambleton, R. K.、Swaminathan, H.(1985)。Item Response Theory: Principles and Applications。Boston, Massachusetts：Kluwer-Nijhoff。

其他
1.	臺灣學生學習成就評量資料庫(2011)。臺灣學生學習成就評量資料庫，http://tasa.naer.edu.tw/brief.htm， 20110420。延伸查詢
2.	Hanson, B. A.，Zeng, L.，Chien, Y.(2004)。ST: A computer program for IRT scale transformation [Computer software]，http://www.education.uiowa.edu/casma， 20110310。
3.	Hanson, B. A.，Zeng, L.，Chien, Y.(2004)。PIE: IRT true and observed scoring equaling for dichotomously scored tests [Computer sothware]，http://www.education.uiowa.edu/casma， 20110310。

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	三至九年級學生數學運算能力等化測量與多向度分析
2.	從多層面Rasch模式來檢視不同的評分者等化連結設計對參數估計的影響
3.	三～八年級資料與可能性能力測驗的發展及信效度分析
4.	領域特定詞彙知識的測量：三至八年級學生數學詞彙能力
5.	結合輔助訊息之單向度試題反應理論能力值估計探究
6.	以可能值方法為基礎之多向度能力值垂直等化探究
7.	定錨試題參數估計誤差分布範圍對受試能力估計精確性之影響
8.	單向度試題反應理論之可能值方法於等化設計下之模擬實驗探究
9.	以多層面Rasch分析的角度來評估標準設定之變異性
10.	大型測驗中同時進行垂直與水平等化效果之探討

1.	CEFR基礎級之華語文聽力與閱讀理解能力測驗研發與電腦化適性評量系統建置

無相關書籍

無相關著作

1.	Exploring the Sales Force Competency in Taiwan: The Iceberg Model Perspective
2.	史羅二氏「工作狂量表」短版之性別測量恆等性分析
3.	「國小句型理解測驗」之編製及其信、效度研究報告
4.	LVQ與多變數反覆加權法於測驗效度檢驗影響
5.	「科學創造性問題解決測驗」之發展
6.	以SIBTEST改進隨機效果對檢測題組DIF之影響
7.	「中學生考試壓力因應量表」之編製與應用
8.	「大一新生學校生活適應量表」之發展
9.	「父母心理控制量表」之編製研究
10.	「國小學童霸凌經驗量表」之編製與應用
11.	結合學習迷思學生區與迷思次序演算法的提案
12.	「幼兒工作記憶測驗」之編製
13.	中文版公共服務動機量表之信效度驗證
14.	建構大學教師因應社會發展趨勢的有效能教學模式：探究教學歷程的雙中介效果
15.	「保存-資源善用」環境態度量表之編製研究

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫