:::

詳目顯示

回上一頁
題名:CEFR基礎級之華語文聽力與閱讀理解能力測驗研發與電腦化適性評量系統建置
作者:王暄博
作者(外文):Hsuan-Po Wang
校院名稱:國立臺中教育大學
系所名稱:教育測驗統計研究所
指導教授:郭伯臣
學位類別:博士
出版日期:2013
主題關鍵詞:華語文能力測驗歐洲共同語文參考架構電腦化適性測驗Chinese Proficiency TestCommon European Framework of ReferenceComputerized Adaptive Testing
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:160
全球化時代的潮流下,多元語言能力日益重要,世界各地興起一股華語文學習熱潮。近年來,隨著華語文學習需求的日益升高,使得以「母語為非華語者」的華語文能力測驗也逐漸受到各國矚目,包括中國的漢語水準考試、臺灣的華語文能力測驗、美國學術評量測驗與跳級安置測驗發展的中文測驗。然而,目前大多數的華語文能力測驗仍以傳統紙筆測驗的方式進行,雖已有許多相關研究藉由電腦或多媒體之應用發展促進華語文學習之工具,但關於華語文能力電腦化適性測驗建構之文獻卻是較缺乏的。因此,本研究藉由上述華語文能力測驗優勢與限制之探究,發展適合國內使用之華語文能力電腦化適性測驗。研究目的包括:以歐洲語言共同參考架構為測驗發展之準則、使用試題反應理論模式建立測驗之量尺分數、建構電腦化適性測驗系統。
綜合上述,本研究以歐洲語言共同參考架構為基礎,研發基礎級華語文聽力與閱讀理解能力測驗,並導入現代測驗理論與測驗等化之技術,建立一套具有信度、效度的基礎級華語文聽力與閱讀理解能力測驗與電腦化適性評量系統。研究樣本為菲律賓靈惠中文學院5至10年級的學生,施測時間為2010年08月02日至12日。測驗資料是使用試題反應理論之三參數Logistic 模式進行分析,並透過測驗等化之技術,建立基礎級華語文聽力與閱讀理解能力測驗之量尺。此外,研究者透過實徵資料模擬電腦適性測驗系統之流程,比較最大概似法、期望後驗法、及最大後驗法等能力估計方法,以建立電腦化適性測驗系統。研究結果顯示本研究所建立之華語文基礎級聽力與閱讀理解能力測驗,經由預試與試題修審之程序,試題具有一定的品質。此外,在電腦化適性測驗方面,建議使用期望後驗法。
In the era of globalization, the trend towards learning Chinese as a foreign language (CFL) has become increasingly popular worldwide. The increasing demand in learning CFL has raised the profile of the Chinese proficiency test (CPT). There are four major CPT focuses on this including Hanyu Shuiping Kaoshi (HSK), Test of Chinese as a Foreign Language (TOCFL), Scholastic Assessment Test (SAT) Subject Test in Chinese with Listening, and Advance Placement (AP) Chinese Language and Culture exams. However, the majority of these tests are administered by the traditional paper and pencil tests format. Although there are many studies about developing the tools for learning CFL, the construction of computerized test for CFL is hard to find on the literatures. The aims of the present study are: adopting the Common European Framework of Reference (CEFR) for item development; providing a framework by using item response theory (IRT) as the scoring method; constructing computerized adaptive testing (CAT) system.
This study will analyze in depth the inadequacy of current CPT’s utilizing the common European framework of reference (CEFR) for language learning, teaching, and assessment to develop a set of reliability and validity standards for A level listening and reading CPT and CAT system. The data will be analyzed by applying IRT three-parameter logistic (3PL) model. One thousand five hundred and seventy-six participants recruited from Grace Christian Collage in Philippine were administered with Chinese listening and reading tests via CBT in September, 2010. In addition, the effectiveness of applying CAT among the three estimating methods, namely maximum likelihood estimation (MLE), expected a posteriori (EAP), and maximum a posteriori (MAP) will be investigated.
中文部分
中國新聞網(2007)。美國中文考試眾多,孩子考完中文AP又要備戰SHK。檢索日期:2009年9月17日。網址:http://news.xinhuanet.com/overseas/2007-05/11/content_6084716.htm
中國漢語水平考試(2012)。中國漢語水平考試。檢索日期:2012年5月19日。網址: http://www.hsk.org.cn/index.aspx
中國漢語水平考試(2012)。中國漢語水平考試。檢索日期:2012年5月19日。網址: http://www.hsk.org.cn/index.aspx
白樂桑、張麗(2008)。《歐洲語言共同參考框架》新理念對漢語教學的啟示與推動:處於抉擇關頭的漢語教學。世界漢語教學,3,58-74。
多媒體英語學會(2007)。歐洲共同語文參考架構(中譯)。高雄:和遠。
何榮桂(2006)。國際電腦化測驗發展趨勢之研究。電腦測驗發展趨勢與國家考試電腦化測驗研討會,2006年5月29日,臺北市。
余慕薌(2008)。APEC第二外語標準及其評價:趨勢、機會及意涵(下)。APEC通訊,103,15-16。
李坤崇(2006)。中小學一貫課程體系參考指引之建議。教育研究月刊,150,119-135。new window
周中天、張莉萍(2007)。華語文能力分級指標之建立。「東亞教育評鑑論壇:新興議題及挑戰」國際會議,2007年10月20-21日,臺灣師範大學。
孟慶明(2007)。在美國非華語環境下中文教學策略之行動研究。國立臺灣師範大學華語文教學研究所碩士論文,未出版,台北市。
柯華葳(2004)。華語文能力測驗編製- 研究與實務。台北:遠流出版公司。new window
國家華語測驗推動工作委員會(2012a)。華語文能力測驗。檢索日期:2012年5月17日。網址:http://www.sc-top.org.tw/
國家華語測驗推動工作委員會(2012b)。兒童華語文能力測驗。檢索日期:2012年5月17日。網址:http://www.sc-top.org.tw/cccc/ch/taker.html
張莉萍(2007)。對外漢語字集。2008年臺灣華語文教學年會暨研討會,2008年11月1-2 日,花蓮慈濟大學。
郭伯臣、王暄博(2008)。大型測驗中同時進行垂直與水平等化效果之探討。教育研究與發展期刊,4(4),87-120。new window
郭珠美(2009)。日漸升溫的中文熱與應對。2009 第二屆華語文教學國際研討會暨工作坊,2009年3月13-14日,私立銘傳大學。
陳柏熹(2006)。能力估計方法對多向度電腦化適性測驗測量精準度的影響。教育心理學報,38(2),195-211。
陳浩然、謝妙玲、周中天(2009)。歐洲共同語文參考架構(CEFR)於華語書面教材中之應用-以《華語你我他》為例。2009 第二屆華語文教學國際研討會暨工作坊,2009年3月13-14日,銘傳大學。
曾玉琳、王暄博、郭伯臣、許天維(2006)。不同BIB 設計對測驗等化的影響。測驗統計年刊,13(2),209-229。台中市:國立台中教育大學。new window
曾建銘、陳清溪(2008)。2006年臺灣學生學習成就評量結果之分析。教育研究與發展期刊,4(4),41-86。new window
黃珮璇、林婉星、郭伯臣、劉湘川(2007)。BIB、PBIB與NEAT設計於多元計分測驗之連結效果比較。2007年中國測驗學會教育測驗學術研討會,2007年11月3日,國立臺灣師範大學。
新華每日電訊(2007)。列為必修課,漢語普通話將成為英國“明日語言”?檢索日期:2010年5月17日。網址:http://big5.xinhuanet.com/gate/big5/news.xinhuanet.com/mrdx/2007-05/25/content_6150448.htm
楊孟麗、譚康榮、黃敏雄(2003)。臺灣教育長期追蹤資料庫:心理計量報告:TEPS 2001分析能力測驗(第一版)。中央研究院調查研究專題中心(管理、釋出單位)。
楊振升、洪淑萍(2002)。基本能力指標與轉化-以語文學習領域為例。教育研究月刊,96,23-33。new window
實用漢語水平認定考試(2012)。實用漢語水平認定考試。檢索日期:2012年5月19日。網址:http://www.c-test.org.cn/index.asp
蔡雅薰(2009)。華語文教材分級研制原理之建構。臺北縣:正中。new window
錢永財(2006)。以a-鄰近法為選題策略之電腦化適性測驗模擬研究。國立臺中教育大學教育測驗統計研究所碩士論文,未出版,台中市。new window
錢永財、劉家惠、郭伯臣(2005)。a-鄰近法選題對電腦適性測驗試題曝光率之比較。2005年教育與心理測驗學術研討會,台北:國立政治大學。
藍珮君(2007)。基礎華語文能力測驗與歐洲共同架構的對應關係。第三屆華文教學國際論壇,2007年12月1-2日,國立臺灣師範大學。new window
籃玉如(2009)。資訊融入華語教學設計理念與實踐。第六屆全球華文網路教育研討會,2009年6月19-21日,台北市。
英文部分
Ackerman, T. A. (1991). The use of unidimensional parameter estimates of multidimensional items in adaptive testing. Applied Psychological Measurement. 13, 113-127.
Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6), 716-723.
Allen, N. L., Donoghue, J. R., & Schoeps, T. L. (2001). The NAEP 1998 technical report. Washington, DC: National Center for Educational Statistics.
Anderson, E. B. (1973). A goodness of fit test for the Rasch model. Psychometrika, 38, 123-140.new window
Baker, F. B. (1992). Item Response Theory: Parameter Estimation Techniques. New Yook: Marcel Dekker.
Baker, F. B., & Kim, S.-H. (2004). Item response theory: Parameter estimation techniques (2nd ed.). New York: Marcel Dekker.
Birnbaum, A. (1968). Some Latent trait Model and Their Use in Inferring an Examinee’s Ability. In F. M. Lord and M. R. Novick, Statistical theories of mental test scores, 17-20. Reading, Mass: Addison-Wesley.new window
Boar, B. H. (1984). Application prototyping: A requirements definition strategy for the '80s. John Wiley & Sons, New York.
Bock, R. D. & Mislevy, R. J. (1982). Adaptive EAP Estimation of Ability in A Microcomputer Environment. Applied Psychological Measurement, 6, 431-444.
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46, 443-459.
Bose, R. C. & Nair, K. R. (1939). Partially balanced incomplete block designs. Sankhya, 4, 337–372.
Brannan, R. L., & Kolen, M. J. (1987). Some practical issues in equating. Applied Psychological Measurement, 11, 279-290.
Brennan, R. L. (2008). A Discussion of Population Invariance. Applied Psychological Measurement, 32(1), 102-114.
Chang, H., Qian, J., & Ying, Z. (2001). a-Stratified Multistage Computerized Adaptive Testing with b-Blocking. Applied Psychological Measurement, 25, 333-341.
College Board (2012a). Chinese with Listening. Retrieved May 20, 2012, from http://www.collegeboard.com/student/testing/sat/lc_two/chinese/chinese.html?chinese
College Board (2012b). Chinese language and culture. Retrieved May 7, 2012, from http://www.collegeboard.com/student/testing/ap/sub_chineselang.html
Cook, L. L., & Petersen, N. S. (1987). Problems related to the use of conventional and item response theory equating methods in less than optimal circumstances. Applied Psychological Measurement: Issues and Practice 10, 37-45.
Council of Europe. (2001). Common European Framework of Reference for Languages: Learning, Teaching, Assessment. Cambridge, UK: Cambridge University Press.
Dorans, N. J. & Holland, P. W. (2000). Linking Scores from Multiple Instruments. Evaluation of National and State Assessments of Evaluation. Board on Educational Testing and Assessment. Washington, DC: National Academy Press.
Dorans, N. J. & Liu, J. (2008). Anchor Test Type and Population Invariance: An Exploration Across Subpopulations and Test Administrations. Applied Psychological Measurement, 32(1), 81-97.
Haebara, T. (1980). Equating Logistic Ability Scales by a Weighted Least Squares Method. Japanese Psychological Research, 22, 144-149.
Hambleton, R. K., & Swaminathan, H. (1985). Item Response Theory: Principles and Application. Boston, MA:Kivwer-Nijhoff.
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Newburry Park, CA: SAGE.
Hung, P. H. (1988). Application of Computerized Adaptive Testing to The University Entrance Exam of Taiwan, R. O. C. Unpublished doctoral dissertation, University of Minnesota, Minnesota.
Kang, T., & Cohen, A. S. (2007). IRT Model Selection Methods for Dichotomous Items. Applied Psychological Measurement, 31(4), 331-358.
Kao, C. W., Kim, S., & Hatrak, N. (2005). Scale drift study for a large-scale English proficiency test. Paper presented at the annual meeting of the Northeastern Educational Research Association (NERA) held between October 19 and 21, 2005 in Kerhonkson, N.Y.new window
Kecker, G., & Eckes, T. (2007). Linking the TestDaF to the CEFR: The case of writing proficiency. Paper presented at the Fourth Annual Conference of EALTA. Retrieved August 4, 2009, from http://www.ealta.eu.org/conference/2007/docs/pres_sunday/Kecker&Eckes.pdf
Klein, L. W., & Jarjoura, D. (1985). The importance of content representation for common-item equating with non-random groups. Journal of Educational Measurement, 22, 197-206.
Kolen, M. J. & Brennan, R. J. (1995). Test Equating: Methods and Practices. New York: Springer-Verlag.
Kolen, M. J. & Brennan, R. L. (2004). Test equating, scaling, and linking: methods and practices (2nd ed.). New York: Springer-Verlag.
Kuo, B.-C., Tseng, H.-C., and Shih, S.-C. (2013). A Computerized Adaptive Testing System for Undergraduate Level Chinese Reading Proficiency. Turkish Online Journal of Educational Technology. (Accepted).
Leonard, T., & Hsu, J. S. J. (1999). Bayesian methods. New York: Cambridge University Press.
Li, F., Cohen, A. S., Kim, S-H., & Cho, S-J. (2009). Model Selection Methods for Mixture Dichotomous IRT Models. Applied Psychological Measurement, 33(5), 353-373.
Lord, F. M. (1977). Practical Applications of Item Characteristic Curve Theory. Jaurnal of Educational Measurement, 14, 117-138.
Lord, F. M. (1980). Application of item response theory to practical testing problems. hillsdale, NJ : lawrence erlbaum associates.
Lord, F. M. (1980). Applications of Item Response Theory to Practical Testing Problems. Hillsdale, NJ: Lawrence Eribaum Associates.
Lord, F. M., & Wingersky, M. S. (1984). Comparing IRT true-score and equipercentile observed score ”equatings.” Applied Psychological Measurement, 8, 452-461.
Marco, G., Petersen, N., & Stewart, E. (1979). A test of the adequacy of curvilinear score equating models. Paper presented at the Computerized Adaptive Testing Conference, Minneapolis, MN.new window
Martin, M. O., Mullis, I. V.S., & Chrostowski, S. J. (Eds). (2004). TIMSS 2003 Technical Report. Chestnut Hill, MA: Boston College, Center for the Study of Testing, Evaluation, and Educational Policy.
Masters, G. N. (1982). A Rasch model for partial credit model. Psychometrika, 47, 149-174.
McBride, J. R. & Martin, J. T. (1983). Reliability and Validity of Adaptive Ability Tests in a Military Setting. In D. J. Wiess (Ed.), New Horizons in Testing: Latent Trait Test Theory and Computerized Adaptive Testing (pp. 223-236). New York: Academic Press.
Mislevy, R. J. & Bock, R. D. (1990). PC-BILOG-Item analysis and test scoring with binary logistic models [Computer software]. Mooresville, IN: Scientific Software.new window
Mislevy, R. J. & Sheehan, K. M. (1987). Marginal estimation procedures, in A.E. Beaton (ed.). The NAEP 1983-1984 Technical Report (Report No. 15-TR-20). Educational Testing Service, Princeton, N.J.
Muraki, E. (1992). A generalized Partial credit model:Application of an EM algorithm. Applied Psychological Measurement, 16(2), 159-176.
Muraki, E., & Bock, R. D. (1991). PARSCALE: Parameter scaling of rating data [Computer software]. Chicago: Scientific Software International, Inc.
Nancy, S. (2008). A Discussion of Population Invariance of Equating. Applied Psychological Measurement, 32(1), 98-101.
NCACLS (2012). National Preparation Test for SAT Subject Test in Chinese with Listening. Retrieved May 7, 2012, from http://www.scccs.net/events/event34/ SATII/2010SATII.pdf
Nemhauser, G. L., & Wolsey, L. A. (1999). Integer and combinatorial optimization. New York: John Wiley.
OECD (2005). PISA 2003 Technical Report. OCED. Paris.
OECD (2009). PISA 2006 Technical Report. OCED, Paris.
Petersen, N. S., Cook, L. L., & Stocking M. L. (1983). IRT versus conventional equating methods: a comparative study of scale stability. Journal of Educational Statistics, 8(2), 135-156.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, Norming, and Equating. In R.L. Linn (Ed.), Educational measurement (3rd ed., pp. 221-262). New York: Macmillan.
Puhan, G. (2007). Scale drift in equating on a test that employs cut scores. RR-07-34, Educational Testing Service, Princeton, New Jersey.new window
Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Chicago: University of Chicago Press.
Rust, K.F., and Johnson, E.G. (1992). Sampling and weighting the national assessment. Journal of Educational Statistics, Special Issue: National Assessment of Educational Progress, 17(2), 111-129.
Schwarz, G. (1978). Estimating the dimension of a model. Annals of Statistics, 6, 461-464.
Stocking, M. L. & Lord, F. M. (1983). Developing a Common Metric in Item Response Theory. Applied Psychological Measurement, 7(2), 201-211.
Stocking, M. L. (1994). Three Practical Issues for Modern Adaptive Testing Item Pools. Educational Testing Service, Princeton, N. J.
Tannenbaum, R. J., & Wylie, E. C. (2005). Mapping English proficiency test scores onto the Common European Framework (TOFEL Research Rep. NO. RR-80). Retrieved August 4, 2012, from http://www.ets.org/Media/Research/pdf/ RR-05-18.pdfnew window
Taylor, L. (2004). IELTS, Cambridge ESOL examination and the Common European Framework. Research Notes, 18, 2-3. University of Cambridge, ESOL Examinations.
Tianyou, W. (2005). An Alternative Continuization Method to the Kernel Method in von Davier, Holland and Thayer's (2004) Test Equating Framework.
Tozer, M. (1987). The joy of strength and movement: A centennial appreciation of Edward Thring. Physical Education Review, 10(1), 58-63.
U.S. Department of State (2006). National security language initiative. Retrieved May 21, 2009, from http://merln.ndu.edu/archivepdf/nss/state/58733.pdf
van der Linden, W. J., & Veldkamp, B. P.,& Carlson, J. E. (2004). Optimizing balanced incomplete block designs for educational assessments.Applied Psychological Measurement, 28, 317-331.
von Davier, A. A., & Liu, M. (2008). Population Invariance of Test Equating and Linking: Theory Extension and Applications Across Exams. Applied Psychological Measurement, 32(1), 9-10.
von Davier, A. A., & Wilson, C. (2008). Investigating the population sensitivity assumption of Item Response Theory true-score equating across two subgroups of examinees and two test formats. Applied Psychological Measurement, 32(1), 11-26.new window
von Davier, A. A., Holland, P. W., & Thayer, D. T. (2004). The kernel method of test equating. New York: Springer.new window
Wang, H.-P., Kuo, B.-C., Tsai, Y.-H., and Liao, C.-H. (2012). A CEFR-based Computerized Adaptive Testing System for Chinese Proficiency. Turkish Online Journal of Educational Technology, 11(4), 1-12.
Wang, T., & Vispoel, W. P. (1998). Properties of Ability Estimation Methods in Computerized Adaptive Testing. Journal of Educational Measurement, 35, 109-135.
Wright, B. D. (1999). Fundamental measurement for psychology. The new rules of measurement. S. E. Embretson and S. L. Hershberger. Mahwah NJ, Lawrence Erlbaum Associates.
Wu, M. L., Adams, R. J., & Wilson, M. R. (1998). Acer ConQuest. Melbourne, Victoria, Australia: Australian Council for Educational Research press.
Yang, W.-L., & Gao, R. (2008). Invariance of Score Linkings Across Gender Groups for Forms of a Testlet-Based College-Level Examination Program Examination. Applied Psychological Measurement, 32, 45-61.
Yates, F. (1936). A new method of arranging variety trials involving a large number of varieties. J. Agric. Sci. 26, 424-455.
Yi, Q., Harris, D. J., & Gao, X. (2008). Invariance of Equating Functions Across Different Subgroups of Examinees Taking a Science Achievement Test. Applied Psychological Measurement, 32(1), 62-80.
Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (2003). BILOG-MG. Scientific Software lnternational.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top
:::
無相關著作
 
QR Code
QRCODE