中文部分
中國新聞網(2007)。美國中文考試眾多,孩子考完中文AP又要備戰SHK。檢索日期:2009年9月17日。網址:http://news.xinhuanet.com/overseas/2007-05/11/content_6084716.htm
中國漢語水平考試(2012)。中國漢語水平考試。檢索日期:2012年5月19日。網址: http://www.hsk.org.cn/index.aspx
中國漢語水平考試(2012)。中國漢語水平考試。檢索日期:2012年5月19日。網址: http://www.hsk.org.cn/index.aspx
白樂桑、張麗(2008)。《歐洲語言共同參考框架》新理念對漢語教學的啟示與推動:處於抉擇關頭的漢語教學。世界漢語教學,3,58-74。
多媒體英語學會(2007)。歐洲共同語文參考架構(中譯)。高雄:和遠。
何榮桂(2006)。國際電腦化測驗發展趨勢之研究。電腦測驗發展趨勢與國家考試電腦化測驗研討會,2006年5月29日,臺北市。
余慕薌(2008)。APEC第二外語標準及其評價:趨勢、機會及意涵(下)。APEC通訊,103,15-16。
李坤崇(2006)。中小學一貫課程體系參考指引之建議。教育研究月刊,150,119-135。![new window](/gs32/images/newin.png)
周中天、張莉萍(2007)。華語文能力分級指標之建立。「東亞教育評鑑論壇:新興議題及挑戰」國際會議,2007年10月20-21日,臺灣師範大學。
孟慶明(2007)。在美國非華語環境下中文教學策略之行動研究。國立臺灣師範大學華語文教學研究所碩士論文,未出版,台北市。
柯華葳(2004)。華語文能力測驗編製- 研究與實務。台北:遠流出版公司。![new window](/gs32/images/newin.png)
國家華語測驗推動工作委員會(2012a)。華語文能力測驗。檢索日期:2012年5月17日。網址:http://www.sc-top.org.tw/
國家華語測驗推動工作委員會(2012b)。兒童華語文能力測驗。檢索日期:2012年5月17日。網址:http://www.sc-top.org.tw/cccc/ch/taker.html
張莉萍(2007)。對外漢語字集。2008年臺灣華語文教學年會暨研討會,2008年11月1-2 日,花蓮慈濟大學。
郭伯臣、王暄博(2008)。大型測驗中同時進行垂直與水平等化效果之探討。教育研究與發展期刊,4(4),87-120。![new window](/gs32/images/newin.png)
郭珠美(2009)。日漸升溫的中文熱與應對。2009 第二屆華語文教學國際研討會暨工作坊,2009年3月13-14日,私立銘傳大學。
陳柏熹(2006)。能力估計方法對多向度電腦化適性測驗測量精準度的影響。教育心理學報,38(2),195-211。
陳浩然、謝妙玲、周中天(2009)。歐洲共同語文參考架構(CEFR)於華語書面教材中之應用-以《華語你我他》為例。2009 第二屆華語文教學國際研討會暨工作坊,2009年3月13-14日,銘傳大學。
曾玉琳、王暄博、郭伯臣、許天維(2006)。不同BIB 設計對測驗等化的影響。測驗統計年刊,13(2),209-229。台中市:國立台中教育大學。
曾建銘、陳清溪(2008)。2006年臺灣學生學習成就評量結果之分析。教育研究與發展期刊,4(4),41-86。![new window](/gs32/images/newin.png)
黃珮璇、林婉星、郭伯臣、劉湘川(2007)。BIB、PBIB與NEAT設計於多元計分測驗之連結效果比較。2007年中國測驗學會教育測驗學術研討會,2007年11月3日,國立臺灣師範大學。
新華每日電訊(2007)。列為必修課,漢語普通話將成為英國“明日語言”?檢索日期:2010年5月17日。網址:http://big5.xinhuanet.com/gate/big5/news.xinhuanet.com/mrdx/2007-05/25/content_6150448.htm
楊孟麗、譚康榮、黃敏雄(2003)。臺灣教育長期追蹤資料庫:心理計量報告:TEPS 2001分析能力測驗(第一版)。中央研究院調查研究專題中心(管理、釋出單位)。
楊振升、洪淑萍(2002)。基本能力指標與轉化-以語文學習領域為例。教育研究月刊,96,23-33。![new window](/gs32/images/newin.png)
實用漢語水平認定考試(2012)。實用漢語水平認定考試。檢索日期:2012年5月19日。網址:http://www.c-test.org.cn/index.asp
蔡雅薰(2009)。華語文教材分級研制原理之建構。臺北縣:正中。
錢永財(2006)。以a-鄰近法為選題策略之電腦化適性測驗模擬研究。國立臺中教育大學教育測驗統計研究所碩士論文,未出版,台中市。![new window](/gs32/images/newin.png)
錢永財、劉家惠、郭伯臣(2005)。a-鄰近法選題對電腦適性測驗試題曝光率之比較。2005年教育與心理測驗學術研討會,台北:國立政治大學。
藍珮君(2007)。基礎華語文能力測驗與歐洲共同架構的對應關係。第三屆華文教學國際論壇,2007年12月1-2日,國立臺灣師範大學。![new window](/gs32/images/newin.png)
籃玉如(2009)。資訊融入華語教學設計理念與實踐。第六屆全球華文網路教育研討會,2009年6月19-21日,台北市。
英文部分
Ackerman, T. A. (1991). The use of unidimensional parameter estimates of multidimensional items in adaptive testing. Applied Psychological Measurement. 13, 113-127.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
Allen, N. L., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report. Washington, DC: National Center for Educational Statistics.
Anderson, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123-140.![new window](/gs32/images/newin.png)
Baker, F. B. (1992). Item Response Theory: Parameter Estimation Techniques. New Yook: Marcel Dekker.
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
Birnbaum, A. (1968). Some Latent trait Model and Their Use in Inferring an Examinee’s Ability. In F. M. Lord and M. R. Novick, Statistical theories of mental test scores, 17-20. Reading, Mass: Addison-Wesley.![new window](/gs32/images/newin.png)
Boar, B. H. (1984). Application prototyping: A requirements definition strategy for the '80s. John Wiley & Sons, New York.
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP Estimation of Ability in A Microcomputer Environment. Applied Psychological Measurement, 6, 431-444.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bose, R. C. & Nair, K. R. (1939). Partially balanced incomplete block designs. Sankhya, 4, 337–372.
Brannan, R. L., & Kolen, M. J. (1987). Some practical issues in equating. Applied Psychological Measurement, 11, 279-290.
Brennan, R. L. (2008). A Discussion of Population Invariance. Applied Psychological Measurement, 32(1), 102-114.
Chang, H., Qian, J., & Ying, Z. (2001). a-Stratified Multistage Computerized Adaptive Testing with b-Blocking. Applied Psychological Measurement, 25, 333-341.
College Board (2012a). Chinese with Listening. Retrieved May 20, 2012, from http://www.collegeboard.com/student/testing/sat/lc_two/chinese/chinese.html?chinese
College Board (2012b). Chinese language and culture. Retrieved May 7, 2012, from http://www.collegeboard.com/student/testing/ap/sub_chineselang.html
Cook, L. L., & Petersen, N. S. (1987). Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement: Issues and Practice 10, 37-45.
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge, UK: Cambridge University Press.
Dorans, N. J. & Holland, P. W. (2000). Linking Scores from Multiple Instruments. Evaluation of National and State Assessments of Evaluation. Board on Educational Testing and Assessment. Washington, DC: National Academy Press.
Dorans, N. J. & Liu, J. (2008). Anchor Test Type and Population Invariance: An Exploration Across Subpopulations and Test Administrations. Applied Psychological Measurement, 32(1), 81-97.
Haebara, T. (1980). Equating Logistic Ability Scales by a Weighted Least Squares Method. Japanese Psychological Research, 22, 144-149.
Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Application. Boston, MA:Kivwer-Nijhoff.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newburry Park, CA: SAGE.
Hung, P. H. (1988). Application of Computerized Adaptive Testing to The University Entrance Exam of Taiwan, R. O. C. Unpublished doctoral dissertation, University of Minnesota, Minnesota.
Kang, T., & Cohen, A. S. (2007). IRT Model Selection Methods for Dichotomous Items. Applied Psychological Measurement, 31(4), 331-358.
Kao, C. W., Kim, S., & Hatrak, N. (2005). Scale drift study for a large-scale English proficiency test. Paper presented at the annual meeting of the Northeastern Educational Research Association (NERA) held between October 19 and 21, 2005 in Kerhonkson, N.Y.![new window](/gs32/images/newin.png)
Kecker, G., & Eckes, T. (2007). Linking the TestDaF to the CEFR: The case of writing proficiency. Paper presented at the Fourth Annual Conference of EALTA. Retrieved August 4, 2009, from http://www.ealta.eu.org/conference/2007/docs/pres_sunday/Kecker&Eckes.pdf
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with non-random groups. Journal of Educational Measurement, 22, 197-206.
Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer-Verlag.
Kuo, B.-C., Tseng, H.-C., and Shih, S.-C. (2013). A Computerized Adaptive Testing System for Undergraduate Level Chinese Reading Proficiency. Turkish Online Journal of Educational Technology. (Accepted).
Leonard, T., & Hsu, J. S. J. (1999). Bayesian methods. New York: Cambridge University Press.
Li, F., Cohen, A. S., Kim, S-H., & Cho, S-J. (2009). Model Selection Methods for Mixture Dichotomous IRT Models. Applied Psychological Measurement, 33(5), 353-373.
Lord, F. M. (1977). Practical Applications of Item Characteristic Curve Theory. Jaurnal of Educational Measurement, 14, 117-138.
Lord, F. M. (1980). Application of item response theory to practical testing problems. hillsdale, NJ : lawrence erlbaum associates.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Eribaum Associates.
Lord, F. M., & Wingersky, M. S. (1984). Comparing IRT true-score and equipercentile observed score ”equatings.” Applied Psychological Measurement, 8, 452-461.
Marco, G., Petersen, N., & Stewart, E. (1979). A test of the adequacy of curvilinear score equating models. Paper presented at the Computerized Adaptive Testing Conference, Minneapolis, MN.![new window](/gs32/images/newin.png)
Martin, M. O., Mullis, I. V.S., & Chrostowski, S. J. (Eds). (2004). TIMSS 2003 Technical Report. Chestnut Hill, MA: Boston College, Center for the Study of Testing, Evaluation, and Educational Policy.
Masters, G. N. (1982). A Rasch model for partial credit model. Psychometrika, 47, 149-174.
McBride, J. R. & Martin, J. T. (1983). Reliability and Validity of Adaptive Ability Tests in a Military Setting. In D. J. Wiess (Ed.), New Horizons in Testing: Latent Trait Test Theory and Computerized Adaptive Testing (pp. 223-236). New York: Academic Press.
Mislevy, R. J. & Bock, R. D. (1990). PC-BILOG-Item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software.![new window](/gs32/images/newin.png)
Mislevy, R. J. & Sheehan, K. M. (1987). Marginal estimation procedures, in A.E. Beaton (ed.). The NAEP 1983-1984 Technical Report (Report No. 15-TR-20). Educational Testing Service, Princeton, N.J.
Muraki, E. (1992). A generalized Partial credit model:Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176.
Muraki, E., & Bock, R. D. (1991). PARSCALE: Parameter scaling of rating data [Computer software]. Chicago: Scientific Software International, Inc.
Nancy, S. (2008). A Discussion of Population Invariance of Equating. Applied Psychological Measurement, 32(1), 98-101.
NCACLS (2012). National Preparation Test for SAT Subject Test in Chinese with Listening. Retrieved May 7, 2012, from http://www.scccs.net/events/event34/ SATII/2010SATII.pdf
Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and combinatorial optimization. New York: John Wiley.
OECD (2005). PISA 2003 Technical Report. OCED. Paris.
OECD (2009). PISA 2006 Technical Report. OCED, Paris.
Petersen, N. S., Cook, L. L., & Stocking M. L. (1983). IRT versus conventional equating methods: a comparative study of scale stability. Journal of Educational Statistics, 8(2), 135-156.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, Norming, and Equating. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: Macmillan.
Puhan, G. (2007). Scale drift in equating on a test that employs cut scores. RR-07-34, Educational Testing Service, Princeton, New Jersey.![new window](/gs32/images/newin.png)
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.
Rust, K.F., and Johnson, E.G. (1992). Sampling and weighting the national assessment. Journal of Educational Statistics, Special Issue: National Assessment of Educational Progress, 17(2), 111-129.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Stocking, M. L. & Lord, F. M. (1983). Developing a Common Metric in Item Response Theory. Applied Psychological Measurement, 7(2), 201-211.
Stocking, M. L. (1994). Three Practical Issues for Modern Adaptive Testing Item Pools. Educational Testing Service, Princeton, N. J.
Tannenbaum, R. J., & Wylie, E. C. (2005). Mapping English proficiency test scores onto the Common European Framework (TOFEL Research Rep. NO. RR-80). Retrieved August 4, 2012, from http://www.ets.org/Media/Research/pdf/ RR-05-18.pdf![new window](/gs32/images/newin.png)
Taylor, L. (2004). IELTS, Cambridge ESOL examination and the Common European Framework. Research Notes, 18, 2-3. University of Cambridge, ESOL Examinations.
Tianyou, W. (2005). An Alternative Continuization Method to the Kernel Method in von Davier, Holland and Thayer's (2004) Test Equating Framework.
Tozer, M. (1987). The joy of strength and movement: A centennial appreciation of Edward Thring. Physical Education Review, 10(1), 58-63.
U.S. Department of State (2006). National security language initiative. Retrieved May 21, 2009, from http://merln.ndu.edu/archivepdf/nss/state/58733.pdf
van der Linden, W. J., & Veldkamp, B. P.,& Carlson, J. E. (2004). Optimizing balanced incomplete block designs for educational assessments.Applied Psychological Measurement, 28, 317-331.
von Davier, A. A., & Liu, M. (2008). Population Invariance of Test Equating and Linking: Theory Extension and Applications Across Exams. Applied Psychological Measurement, 32(1), 9-10.
von Davier, A. A., & Wilson, C. (2008). Investigating the population sensitivity assumption of Item Response Theory true-score equating across two subgroups of examinees and two test formats. Applied Psychological Measurement, 32(1), 11-26.
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.![new window](/gs32/images/newin.png)
Wang, H.-P., Kuo, B.-C., Tsai, Y.-H., and Liao, C.-H. (2012). A CEFR-based Computerized Adaptive Testing System for Chinese Proficiency. Turkish Online Journal of Educational Technology, 11(4), 1-12.
Wang, T., & Vispoel, W. P. (1998). Properties of Ability Estimation Methods in Computerized Adaptive Testing. Journal of Educational Measurement, 35, 109-135.
Wright, B. D. (1999). Fundamental measurement for psychology. The new rules of measurement. S. E. Embretson and S. L. Hershberger. Mahwah NJ, Lawrence Erlbaum Associates.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). Acer ConQuest. Melbourne, Victoria, Australia: Australian Council for Educational Research press.
Yang, W.-L., & Gao, R. (2008). Invariance of Score Linkings Across Gender Groups for Forms of a Testlet-Based College-Level Examination Program Examination. Applied Psychological Measurement, 32, 45-61.
Yates, F. (1936). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci. 26, 424-455.
Yi, Q., Harris, D. J., & Gao, X. (2008). Invariance of Equating Functions Across Different Subgroups of Examinees Taking a Science Achievement Test. Applied Psychological Measurement, 32(1), 62-80.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG. Scientific Software lnternational.