Can We Rely Too Much on Testlets? The Influence of the Number of Testlet Items on Parameter Estimation_

:::

詳目顯示

第 1 筆 / 總合 1 筆

/1頁

來源文獻資料
摘要
外文摘要
引文資料

題名：	Can We Rely Too Much on Testlets? The Influence of the Number of Testlet Items on Parameter Estimation
書刊名：	測驗學刊
作者：	林奕宏／施慶麟
作者(外文)：	Lin, Yi-hung／Shih, Ching-lin
出版日期：	2013
卷期：	60:4
頁次：	頁649-680
主題關鍵詞：	參數估計；測驗設計；題組；題組效果；羅序題組模式；Parameter estimation；Rasch testlet model；Test design；Testlet effect；Testlet
原始連結：	連回原系統網址
相關次數：	被引用次數:期刊(1) 博士論文(0) 專書(0) 專書論文(0) 排除自我引用:1 共同引用:0 點閱:23

題組題已被廣泛應用在各種測驗情境裡，然而，研究已發現題組效果會對測驗結果產生某種程度的影響。本研究目的即在進一步探究題組的題數與整體測驗結果的關係，並聚焦在參數估計結果與測驗信度的變化。本研究包含實徵分析及模擬研究。實徵分析以台灣2007年大學入學考試英語科測驗為例，發現測驗資料中含有顯著的題組效果；接著以實徵分析所得的參數值為基礎，進行模擬研究。模擬的結果發現，測驗中題組題的數目會對受試者的能力估計值產生顯著的影響：當題組題數目減少時，與受試者能力估計值有關的偏誤、標準誤、均方差、平均絕對誤差等也會隨之降低，而測驗信度則會隨之提高，但試題難度受到的影響就相對較小。換言之，如果測驗目的是希望獲得精確的受試者能力估計值，如入學測驗等，則對題組題的使用，尤其是題組題的數目，就須特別小心控制。

以文找文

Testlet items are commonly used in test situations. However, studies have found that the testlet effects have some impact on test results. The purpose of this study is to investigate further the influence of the number of testlet items on the entire test and to observe changes in the parameter estimates as well as test reliability. This study consists of an empirical analysis and two simulation studies. The English test in Taiwan's 2007 College Entrance Examination was analyzed in this study as an example. The empirical analysis demonstrates the non-ignorable testlet effects in the dataset. The parameter estimates obtained from the empirical analysis are then used in the simulation studies. The simulation studies reveal that the total number of testlet items has a significant impact on the person ability estimate; bias, standard error, mean square error and mean absolute error drop, but the EAP test reliability rises, when fewer testlet items were included in the test. In terms of the item difficulty estimate, this impact is relatively small; only standard error shows a consistent increase when the number of testlet items increases, and this effect is not consistent for bias, mean square error and mean absolute error. In sum, it can be concluded that the testlet effects are not beneficial to ability estimation, and this influence undermines test fairness. Other suggestions for test design are provided in the conclusion.

以文找文

期刊論文
1.	Bradlow, E. T.、Wainer, H.、Wang, X.(1999)。A Bayesian Random Effects Model for Testlets。Psychometrika，64(2)，153-168。
2.	Baghaei, P.(2008)。Local dependency and Rasch measures。Measurement Transactions，21(3)，1105-1106。
3.	Bao, H.、Dayton, C. M.、Hendrickson, A. B.(2009)。Differential item functioning amplification and cancellation in a reading test。Practical Assessment, Research & Evaluation，14(9)，1-27。
4.	陳柏熹、黃宏宇、王文中(20080400)。題組之相關特性對電腦化適性測驗測量精準度的影響。測驗學刊，55(1)，129-150。延伸查詢
5.	Draney, K.、Wilson, M.(2008)。A LLTM approach to the examination of teachers' ratings of classroom assessment tasks。Psychology Science Quarterly，50，417-432。
6.	Georgiadou, E.、Triantafillou, E.、Economides, A. A.(2007)。A review of item exposure control strategies for computerized adaptive testing developed from 1983 to 2005。The Journal of Technology, Learning, and Assessment，5(8)，1-38。
7.	Ip, E. H. S.(2001)。Testing for local dependency in dichotomous and polytomous item response models。Psychometrika，66(1)，109-132。
8.	Steinberg, L.、Thissen, D.(1996)。Uses of item response theory and the testlet concept in the measurement of psychopathology。Psychological Methods，1(1)，81-97。
9.	Yang, X.、Poggio, J. C.、Glasnapp, D. R.(2006)。Effects of estimation bias on multiplecategory classification with an IRT-based adaptive classification procedure。Educational and Psychological Measurement，66(4)，545-564。
10.	Zhang, B.(2008)。Investigating proficiency classification for the examination for the certificate of proficiency in English (ECPE)。Spaan Fellow Working Papers in Second or Foreign Language Assessment，6，57-75。
11.	Fischer, G. H.、Ponocny, I.(1994)。An extension of the partial credit model with an application to the measurement of change。Psychometrika，59(2)，177-192。
12.	Wainer, H.、Kiely, G. L.(1987)。Item clusters and computerized adaptive testing: A case for testlets。Journal of Educational Measurement，24(3)，185-201。
13.	Yen, W. M.(1993)。Scaling performance assessments: Strategies for managing local item dependence。Journal of Educational Measurement，30(3)，187-213。
14.	Andrich, D.(1978)。A Rating Formulation for Ordered Response Categories。Psychometrika，43(4)，561-573。
15.	Fischer, G. H.(1973)。The linear logistic test model as an instrument in educational research。Acta Psychologica，37(6)，359-374。
16.	Warm, T. A.(1989)。Weighted likelihood estimation of ability in item response theory。Psychometrika，54(3)，427-450。
17.	Dempster, Arthur P.、Laird, Nan M.、Rubin, Donald B.(1977)。Maximum likelihood from incomplete data via the EM algorithm。Journal of the Royal Statistical Society: Series B (Methodological)，39(1)，1-38。
18.	Bock, R. D.、Aitkin, M.(1981)。Marginal maximum likelihood estimation of item parameters: application of an EM algorithm。Psychometrika，46(4)，443-459。
19.	Wang, Wen-Chung、Wilson, Mark(2005)。The Rasch testlet model。Applied Psychological Measurement，29(2)，126-149。
20.	Masters, G. N.(1982)。A Rasch model for partial credit scoring。Psychometrika，47(2)，149-174。
21.	Adams, Raymond J.、Wilson, Mark R.、Wang, Wen-chung(1997)。The multidimensional random coefficients multinomial logit model。Applied Psychological Measurement，21(1)，1-23。
22.	Wang, W. C.(1999)。Direct Estimation of Correlations among Latent Traits Within IRT Framework。Methods of Psychological Research Online，4(2)，47-68。
23.	Wilson, M. R.(1992)。The Partial Order Model: An Extension of the Partial Credit Model。Applied Psychological Measurement，16，309-325。
24.	Wainer, H.、Tissen, D.(1987)。Estimating Ability with the Wrong Model。Journal of Educational Statistics，12，339-368。

研究報告
1.	Bao, H.、Gotwals, A. W.、Mislevy, R.(2006)。Assessing local item dependence in building explanation tasks。Menlo Park, CA：SRI International, Center for Technology in Learning。
2.	Glas, C. A. W.、Vos, H. J.(2006)。Testlet-based adaptive mastery testing。Newtown, PA：Law School Admission Council。

學位論文
1.	Tseng, F. L.(2001)。Multidimensional Adaptive Testing Using the Weighted Likelihood Estimation: A Comparison of Estimation Methods(博士論文)。University of Pittsburgh，Pittsburgh, PA。

圖書
1.	Wu, M. L.、Adams, R. J.、Wilson, M. R.(2007)。ACER ConQuest: Generalized item response modeling software。Hawthorn：Australia Council for Educational Research。
2.	Bahadur, R.(1961)。A representation of the joint distribution of responses to n dichotomous items。Studies in item analysis and prediction。Palo Alto, CA：Stanford University Press。
3.	Glas, C. A. W.、Wainer, H.、Bradlow, E. T.(2000)。MML and EAP estimation in testletbased adaptive testing。Computerized adaptive testing: Theory and practice。Dordrecht：Kluwer Academic Publishers。
4.	The College Board(2010)。Exam development & assembly。New York, NY：The College Board。
5.	De Boeck, P.、Wilson, M.(2004)。Explanatory item response models: A generalized linear and nonlinear approach。New York：Springer-Verlag。
6.	Reckase, M. D.(2009)。Multidimensional item response theory。New York, NY：Springer。
7.	Baker, F. B.、Kim, S. H.(2004)。Item response theory: Parameter estimation techniques。New York：Marcel Dekker, Inc。
8.	Wainer, H.、Bradlow, E. T.、Wang, X.(2007)。Testlet response theory and its applications。Cambridge University Press。
9.	Rasch, G.(1960)。Probabilistic models for some intelligence and attainment tests。Copenhagen：The Danish Institute of Educational Research。
10.	Embretson, Susan E.、Reise, Steven P.(2000)。Item Response Theory for Psychologists。Lawrence Erlbaum Associates, Inc.。
11.	Linacre, J. M.(1989)。Many-facet Rasch measurement。Chicago：MESA Press。
12.	Vos, H. J.、Glas, C. A. W.(2000)。Testlet-based Adaptive Mastery Testing。Computerized Adaptive Testing: Theory and Practice。London：Kluwer Academic Publishers。
13.	Wainer, H.、Bradlow, E. T.、Du, Z.(2000)。Testlet response theory: An analog for the 3-PL useful in adaptive testing。Computerized adaptive testing: Theory and practice。Dordrecht：Kluwer Academic Publishers。

圖書論文
1.	Adams, R. J.、Wilson, M. R.(1996)。Formulating the Rasch model as a mixed coefficients multinomial logit。Objective measurement: Theory into practice。Norwood, NJ：Ablex。

推文
推薦
引用網址
引用嵌入語法
轉寄

top

:::

相關期刊
相關論文
相關專書
相關著作
熱門點閱

1.	Investigating the Score Dependability and Decision Dependability of the GEPT Listening Test: A Multivariate Generalizability Theory Approach
2.	題組之相關特性對電腦化適性測驗測量精準度的影響

無相關博士論文

無相關書籍

無相關著作

1.	Exploring the Sales Force Competency in Taiwan: The Iceberg Model Perspective
2.	史羅二氏「工作狂量表」短版之性別測量恆等性分析
3.	「國小句型理解測驗」之編製及其信、效度研究報告
4.	LVQ與多變數反覆加權法於測驗效度檢驗影響
5.	「科學創造性問題解決測驗」之發展
6.	以SIBTEST改進隨機效果對檢測題組DIF之影響
7.	「中學生考試壓力因應量表」之編製與應用
8.	「大一新生學校生活適應量表」之發展
9.	「父母心理控制量表」之編製研究
10.	「國小學童霸凌經驗量表」之編製與應用
11.	結合學習迷思學生區與迷思次序演算法的提案
12.	「幼兒工作記憶測驗」之編製
13.	中文版公共服務動機量表之信效度驗證
14.	建構大學教師因應社會發展趨勢的有效能教學模式：探究教學歷程的雙中介效果
15.	「保存-資源善用」環境態度量表之編製研究

QR Code

臺灣人文及社會科學引文索引資料庫系統

詳目顯示

臺灣人文及社會科學引文索引資料庫