:::

詳目顯示

回上一頁
題名:階層式試題反應理論模式及其等化估計方法
作者:謝典佑
作者(外文):Hsieh, Tien-Yu
校院名稱:國立臺中教育大學
系所名稱:教育測驗統計研究所
指導教授:郭伯臣
學位類別:博士
出版日期:2011
主題關鍵詞:階層式試題反應理論模式無參數估計法等化同時估計法hierarchical item response theory modelnonparametric estimationequating concurrent estimation
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:52
本研究以階層式試題反應理論模式為基礎,提出階層式試題反應理論模式的無參數和等化同時估計法,並藉由模擬不同情境(人數、題數、能力分布與試題架構)與臺灣學生學習成就評量資料庫的實徵分析,探討所提出的參數估計方法於量尺分數、迴歸參數與試題參數的估計精準度。
  模擬研究顯示:
1. 針對具備階層式結構的測驗資料,對於量尺分數、迴歸參數與試題參數的估計,階層式試題反應理論模式的參數、無參數型與等化同時估計法有較高的參數估計精準度。
2. 當量尺分數違反常態分布假設時,階層式試題反應理論模式的無參數估計方法比參數估計方法對於量尺分數、迴歸參數與試題參數有較高的估計精準度。
3. 當受試樣本數、試題數與領域量尺間的相關程度提高時,階層式試題反應理論模式的參數、無參數估計方法,對於量尺分數、迴歸參數與試題參數的估計,有較高估計精準度。
4. 當定錨試題比率增加時,階層式試題反應理論模式的等化同時估計法,對於量尺分數、迴歸參數與試題參數的估計,有較高的參數估計精準度。
  在實證資料分析方面,透過模式適合度與估計標準誤之檢驗,顯示臺灣學生學習成就評量資料庫的實徵資料較宜使用階層式試題反應理論模式進行量尺分數、迴歸參數與試題參數的估計。
 This paper is to propose non-parametric and equating concurrent estimations. Some simulation experiments were conducted to evaluate the feasibility of the proposed estimations and how the abilities, regressions and item parameters are affected by different factors such as sample sizes, item lengths, ability distributions, and model specifications in comparison to those of one-dimensional and multi-dimensional item response theory models. An analysis of Taiwan Assessment of Student Achievement data is provided as an example.
The simulation results show that:
1. Modeling the data with hierarchical structure for estimating the ability, regression, and item parameters, the accuracy of the parametric, nonparametric, and equating concurrent estimations based on hierarchical item response theory models outperformed that of estimations based on uni- and multi-dimensional item response theory models.
2. When the structure of the prior distribution does not match the assumption of the normal distribution, the accuracy of non-parametric estimation outperformed that of parametric estimation base on hierarchical item response theory models for estimating the ability, regression, and item parameters.
3. The accuracies for estimating ability, regression, and item parameters based on the hierarchical item response theory models increase as the sample sizes, item lengths, and the correlations between the overall and the domain abilities increase.
4. The accuracies of the equating concurrent estimation for estimating ability, regression, and item parameters based on the hierarchical item response theory models increase as the sample sizes, item lengths, the correlations between the overall and the domain abilities and the percentages of anchor items increase.
  Based on the results of model selection index and standard errors, a hierarchical item response theory model is suitable for analysising Taiwan Assessment of Student Achievement data for estimating ability, regression, and item parameters.
中文文獻
吳慧珉 (2001) 。選項特徵曲線之研究-以核函數之平滑化為估計取向。國立臺中師範學院教育測驗統計研究所碩士論文,臺中市。
陳煥文 (2004) 。垂直等化連結特性之研究-四種連結方法的比較。國科會專題研究計畫。new window
臺灣學生學習成就評量資料庫 (2009) 。檢索日期:2009年11月20日。檢自:http://tasa.naer.edu.tw/Release/index.aspx
劉湘川 (2001a) 。相關加權核平滑化無參數試題選項特徵曲線估計法及其IORS整合模式。第五屆華人社會心理與教育測驗學術研討會,1-10。臺北市:中國測驗學會、臺灣師範大學。
劉湘川 (2001b) 。核平滑化試題選項特徵曲線與選項關聯結構整合擴充模式。測驗統計年刊,9 (1) ,1-18。new window
謝典佑、林佳樺、郭伯臣、施淑娟 (2009,9月)。高層次IRT 模式式適合度檢定之研究─以TASA 數學科為例。「大型教育資料庫建置及相關議題」學術研討會。台中:國立台中教育大學。
謝典佑、曾彥鈞、廖晨惠、郭伯臣 (2009,10月)。同時估計法於高層次試題反應理論之研究。中國測驗學會年會暨心理與教育測驗學術研討會。台北:國立台灣師範大學。
謝典佑、楊智為、許天維、郭伯臣 (2009,10月)。整合無參數與MH-within-Gibbs 技術提升高層次試題反應理論參數估計精準度之研究。中國測驗學會年會暨心理與教育測驗學術研討會。台北:國立台灣師範大學。

英文文獻
Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29 (1), 67-91.new window
Ackerman, T. A., Gierl, M. J., & Walker, C. M. (2003). Using multidimensional item response theory to evaluate educational and psychological tests. Educational Measurement: Issues and Practice, 22 (3), 37-51.
Adams, R. J., Wilson, M., & Wang, W. C. (1997). The multidimensional random coefficients multinomial logit model. Applied Psychological Measurement, 21 (1), 1-23.new window
Akaike, H. (1974). A new look at the statistical model identication. IEEE Transactions on Automatic Control, 19 (6), 716-723.
Albert, J. H. (1992). Bayesian estimation of normal ogive item response curves using Gibbs sampling. Journal of Educational Statistics, 17 (3), 251-269.
American Psychiatric Association (1994) .The diagnostic and statisticalmanual of mental disorders (4th ed.). Washington, DC: Author.
Andersen, E. B., & Madsen, M. (1997). Estimating the parameters of a latent population distribution. Psychometrika, 42 (3), 357-374.
Baker, F. B. & Kim, S. H. (2004). Item Response Theory: Parameter Estimation Techniques. New Yook: Marcel Dekker, Inc. 2nd Edition.
Baker, F. B. (1998). An investigation of the item parameter recovery characteristics of a Gibbs sampling procedure. Applied Psychological Measurement, 22, 153-169.
Baker, F. B. (2004). Item Response Theory:Parameter estimation techniques. New York:Marcel Dekker.
Baker, F. B., & Subkoviak, M. J. (1981). Analysis of test results via log-linear models. Applied Psychological Measurement, 5 (4), 503-515.
Birnbaum, A. (1968). Some latent trait models. In F. M. Lord & M. R. Novick (Eds), Statistical theories of mental test scores (pp. 395-479). Reading, MA: Addison & Wesley.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters:Application of an EM algorithm. Psychometrika, 46 (4), 443-459.
Bock, R. D., & Lieberman, M. (1970). Fitting a response model for n dichotomously scored items.  Psychometrika, 35 (2), 179-197.
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6 (4), 431-444.
Boulet, J.R.(1996). The effect of nonnormal ability distributions on IRT parameter estimation using full-information and limited-information methods (item response theory, nonlinear factor analysis). Dissertation abstracts online, University of Ottawa (Canada).
Carrol, J. B. (1993). Human cognitive abilities: A survey of factor-analytic studies. Cambridge, UK: Cambridge University Press.
Congdon, P. (2003). Applied Bayesian Modelling., New York:John Wiley.
Cook, L.L., & Eignor, D.R. (1991). An NCME instructional module on IRT equating methods . Educational Measurement: Issues and Practice, 10 (3), 37-45.
Cressie, N., & Holland, P.W. (1983). Characterizing the manifest probabilities of latent trait models. Psychometrika, 48 (1), 129–141.new window
Crocker, L. & Algina, J. (1986). Introduction to Classical and Modem Test Theory. New York: Holt, Rinehart and Winston.
Cronbach, L. J., & Snow, R. E. (1977). Aptitude and instructional methods. New York: Irvington.
de la Torre, J., & Douglas, J. (2004). Higher-order latent trait models for cognitive diagnosis. Psychometrika, 69 (3), 333-353.
de la Torre, J., & Hong, Y. (2010). Parameter estimation with small sample size a higher-order IRT model approach. Applied Psychological Measurement, 34 (4), 267-285.
de la Torre, J., & Patz, R. (2005). Making the most of what we have: A practical application of multidimensional item response theory in test scoring. Journal of Educational and Behavioral Statistics, 30 (3), 295-311.
de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33 (8), 620-639.
Engelen, R. J. H. (1989). Parameter estimation in the logistic item response model. Doctoral dissertation, Universiteit Twente.
Ferrando, P. J. (2003). The accuracy of the E, N and P trait estimates: An empirical study using the EPQ-R. Personality and Individual Differences, 34 (4), 665-679.
Gelman, A. B., Carlin, J. S., Stern, H. S., & Rubin, D. B. (1995). Bayesian Data Analysis. London; New York: Chapman and Hall.
Geman, S., & Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transcationa on Pattern Analysis and Machine Intelligence, 6 (6), 721-741.
Gustafsson, J. E., & Snow, R. E. (1979). Ability profiles. In R. F. Dillon (Ed.), Handbook on testing (pp. 107-135). Westport, CT: Greenwood Press.
Haebara, T. (1980). Equating Logistic Ability Scales by a Weighted Least Squares Method. Japanese Psychological Research, 22 (3), 144-149.
Hanson, B. A., & Beguin, A. A. (2002). Obtaining a common scale for item response theory item parameters using separate versus concurrent estimation in the common-item equating design. Applied Psychological Measurement, 26 (1), 3-24.new window
Hastings, W. K. (1970). Monte Carlo sampling methods using Markov chains and their applications. Biometrika, 57 (1), 97-109.new window
Hattie, J. (1981). Decision criteria for determining unidimensional and multidimensional normal ogive models of latent trait theory. Armidale, Australia: The University of New England, Center for Behavioral Studies.
Hoskens, M., & De Boeck, P. (1997). A parametric model for local dependencies among test items. Psychological Methods, 2 (3), 261-277.
Hsieh, T. Y., Kuo, B. C., & Shih, S. C. (2009). A Multi-factor High-order Item Response Model Based on MH with Gibbs Method. Paper presented at the Pacific Rim Objective Measurement Symposium, Hong Kong.
Tien-Yu Hsieh, Bor-Chen Kuo, & Chia-Hua Lin. (2011). The concurrent calibration method of high-order item response theory. Paper presented at the annual meeting of national council on measurement in education, Orleans, Louisiana.
Jaeger, R., M. (1981). Some exploratory indices for selection of a test eqauting method. Journal of Educational Measurement, 18 (1), 23-38.new window
Kane, M., T., Mroch, A., A., Suh, Y., & Ripkey, D., R. (2009). Linear equating for the NEAT design: parameter substitution models and chained linear relationship models. Measurement, 7, 125–146.
Kang, T., & Cohen, A. S. (2007). IRT model selection methods for dichotomous items. Applied Psychological Measurement, 31 (4), 331-358.
Kim, S. H., & Cohen, A. S. (1998). A comparison of linking and concurrent calibration under item response theory. Applied Psychological Measurement, 22 (2), 131-143.
Kim, S. H., & Cohen, A. S. (1999). Accuracy of parameter estimation in Gibbs sampling under the two-parameter logistic model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
Kolen, M. J., & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.
Kolen, M. J., & Brennan, R. L. (2004). Test Equating:, Methods and Practices. (2nd ed.). New York: Springer-Verlag.
Kuo, B. C., Hsieh, T. Y., & Cheng, C. M. (2010). Comparing UIRT, MIRT, and HIRT based on model fitting and parameter recovery. Paper presented at the 7th Conference of the International Test Commission, Hong Kong.
Kuo, B. C., Hsieh, T. Y., & Wu, H. M. (2010). Hierarchical item response theory model with nonparametric prior distribution. Paper presented at the 7th Conference of the International Test Commission, Hong Kong.
Kuo, B. C., Hsieh, T. Y., Wu, H. M., & Lin, C. H. (2009). The comparison of one-factor high-order IRT model and multivariate IRT model. Paper presented at the Pacific Rim Objective Measurement Symposium, Hong Kong.
Li, F., Cohen, A. S., Kim, S. H., & Cho, S. J. (2009). Model selection methods for mixture dichotomous IRT models. Applied Psychological Measurement, 33 (5), 353-373.
Lin, T. H., & Dayton, C. M. (1997). Model selection information criteria for non-nested latent class models. Journal of Educational and Behavioral Statistics, 22 (3), 249-264.
Liu, C. H., & Rubin, D. B. (1998). Ellipsoidally symmetric extensions of the general location model for mixed categorical and continuous data. Biometrika, 85 (3), 673-688.
Lord, F. M. (1975). Relative efficiency of number-right and formula scores. British Journal of Mathematical and Statistical Psychology, 28, 46-50.
McKinley, R. L. & Reckase, M. D. (1983). MAXLOG: A computer program for the estimation of the parameters of a multidimensional logistic model. Behavior Research Methods and Instrumentation, 15, 389-390.
Mellenbergh, G.J., & Vijn, P. (1981). The Rasch model as a loglinear model. Applied Psychological Measurement, 5 (3), 369–376.
Mislevy, R.J. (1984). Estimating latent distributions. Psychometrika, 49, 359–381.
Mislevy, R.J., & Bock, R.D. (1990). BILOG-3: Item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software International.
Muraki, E. & Bock, R. D. (1996). PARSCALE: IRT based test scoring and item analysis for graded open-ended exercises and performance tasks (Version 3) [Computer software]. Chicago: Scientific Software.
OECD (2005). PISA 2003 Technical Report. OCED. Paris.
Patz, R. J., & Junker, B. W. (1997). Applications and extensions of MCMC in IRT: Multiple item types, missing data, and rated responses (Technical Report No. 670). Pittsburgh: Carnegie Mellon University, Department of Statistics.
Patz, R. J., & Junker, B. W. (1999). A straightforward approach to Markov chain Monte Carlo methods for item response models. Journal of Educational and Behavioral Statistics, 24 (2), 146-178.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R. L. Linn (ed.). Educational measurement (3rd ed., pp. 221-262). Washington, DC: American Council on Education.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and equating. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: Macmillan.
Puhan, G. (2009). Detecting and Correcting Scale Drift in Test Equating: An Illustration from a Large Scale Testing Program. Applied measurement in education, 22 (1), 79-103.new window
Ramsay, J. O. (1991). Kernel smoothing approaches to nonparametric item characteristic curve estimation. Psychometrika, 56 (4), 611–630.
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research. 
Reckase, M. D. (1985 ). The difficulty of test items that measure more than one ability. Applied Psychological Measurement, 9 (4), 401-412.
Reckase, M. D. (1997). A linear logistic multidimensional model for dichotomous item response data. In W. J. van der Linden & R. K. Hambleton (Eds.), Handbook of modern item response theory (pp. 271-286). New York : Springer.
Reckase, M. D. (2009). Multidimensional item response theory. New York, NY: Springer.
Sahu, S. K. (2002). Bayesian estimation and model choice in item response models. Journal of Statistical Computation and Simulation, 72 (3), 217-232.
Samejima, F. (1998). Efficient nonparametric approaches for estimating the operating characteristics of discrete item responses. Psychometrika, 63 (1), 111-130.new window
Schmitt, J. E., Mehta, P. D., Aggen, S. H., Kubarych, T. S.,&Neale,M. C. (2006). Semi-nonparametric methods for detecting latent nonnormality: A fusion of latent trait and ordered latent class modeling. Multivariate Behavioral Research, 41 (4), 427-443.
Schwarz, G. (1978), Estimating the dimension of a model, Annals of Statistics, 6 (2), 461-464.
Sheng, Y., & Wikle, C. K. (2008). Bayesian Multidimensional IRT Models with a Hierarchical Structure. Educational and Psychological Measuremen, 68 (3), 413-430.
Silverman, B. W. (1986). Density Estimation. London: Chapman and Hall.
Spearman, C. E. (1904). ‘‘General intelligence’’ objectively determined and measured. American Journal of Psychology, 15 (2), 201-293.
Spiegelhalter, D., Best, N., & Carlin, B. (1998). Bayesian deviance, the effective number of parameters, and the comparison of arbitrarily complex models. Technical report, Division of Biostatistics, University of Minnesota. Research Report 98-009.
Stocking, M., L., & Lord, F., M. (1983). Developing a Common Metric in Item Response Theory. Applied Psychological Measurement, 7 (2), 201-211.
Stone, C. A., & Lane, S. (1991). Use of restricted item response theory models for examining the stability of item parameters estimates over time. Applied Measurement in Education, 4 (2), 125-141.
Thissen, D. (1991). MULTILOG user’s guide: Multiple categorical item analysis and test scoring using item response theory. Chicago: Scientific Software.
Thissen, D., & Mooney, J. A. (1989). Log-linear item response models, with applications to data from social surveys. Sociological Methodology, 19, 299-330.
Thurstone, L. L. (1938). Primary mental abilities. Psychometric Monograph, No. 1.new window
Tierney, L. (1994). Markov chains for exploring posterior distributions. Annals of Statistics, 22 (4), 1701-1762.
Wainer, H., Vevea, J.L., Camacho, F., Reeve, B.B., Rosa, K., Nelson, L., Swygert, K.A., & Thissen, D. (2001). Augmented scores : "borrowing strength" to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds), Test Scoring (P. 343-387). Hillsdale, NJ: Lawrence Erlbaum Associates.
Wang, W., Wilson, M. & Cheng, Y. (2000). Local Dependence between Latent Traits when Common Stimuli are Used. Paper presented at the International Objective Measurement Workshop, New Orleans, LA.
Wilson, M., & Adams, R. J. (1995). Rasch models for item bundles. Psychometrika, 60 (2), 181-198.
Woods, C. M. (2006). Ramsay-curve item response theory to detect and correct for nonnormal latent variables. Psychological Methods, 11 (3), 253–270.
Woods, C. M. (2007). Ramsay-curve IRT for Likert-type data. Applied Psychological Measurement, 31 (3), 195–212.
Woods, C. M., & Lin, N. (2008). Item response theory with estimation of the latent density using Davidian curves. Applied Psychological Measurement, 33 (2), 102-117.
Woods, C. M., & Thissen, D. (2004). RCLOG v.1: Software for item response theory parameter estimation with the latent population distribution represented using spline-based densities (Technical Report). Chapel Hill, NC: L. L. Thurstone Psychometric Laboratory.new window
Woods, C. M., & Thissen, D. (2006). Item response theory with estimation of the latent population distribution using spline-based densities. Psychometrika, 71 (2), 281–301.
Wu, M. L., Adams, R. J., & Wilson, M. R.(1998). Acer ConQuest. Melbourne, Victoria, Australia: Australian
Yamamoto, K., & Muraki, E. (1991, April). Nonlinear transformation of IRT scale to account for the effect of nonnormal ability distribution on the item parameter estimation. Paper presented at the annual meeting of the American Educational Research Association, Chicago, IL.
Yao, L. (2010a). Reporting valid and reliability overall score and domain scores. Journal of Educational Measurement, 47 (3), 339-360.
Yao, L. (2010b). Multidimensional linking for domain scores and overall scores for nonequivalent groups. Applied Psychological Measurement, 35 (1), 48-66.new window
Zimowski, M. F., Muraki, E. , Mislevy, R. J. & Bock, R.D.(1996). BILOG-MG. Scientific Software lnternational.
Zwinderman, A. H., & van den Wollenberg, A. L.(1990). Robustness of marginal maximum likelihood estimation in the Rasch model. Applied Psychological Measurement, 14 (1), 73-81.new window

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top