:::

詳目顯示

回上一頁
題名:從難易度鑑別度看近11年大學指考英語試題
作者:熊文龍
作者(外文):Wenlong Hsiung
校院名稱:國立高雄師範大學
系所名稱:英語學系
指導教授:林秀春教授
學位類別:博士
出版日期:2014
主題關鍵詞:難易度鑑別度試題分析Item DifficultyItem DiscriminationItem Analysis
原始連結:連回原系統網址new window
相關次數:
  • 被引用次數被引用次數:期刊(0) 博士論文(0) 專書(0) 專書論文(0)
  • 排除自我引用排除自我引用:0
  • 共同引用共同引用:0
  • 點閱點閱:4
本研究旨在探討近11年來(2003 - 2013)大學指定考試其中英語科選擇題之試題分析,研究方法先統計考科中五大試題題組其中各試題之難易度及鑑別度、進而觀察試題鑑別度數值低於 0.30 之試題,統計其分布狀況和所占總選擇題配分中之百分比、之後考究試題內容及其可能引響考試公平之因素、最後透過SPSS 統計軟體計算來檢定學科能力測驗及大學入學指定科目考試中統計樣本在英語科成績統計上之成對樣本T考驗(Paired-Sample t-Test)。
研究結果及發現如下:(一)以難易度而言,11年來試題難易度均值為0.52,而鑑別度值為0.51;以11年觀察的統計數值而言,英語科試題在難易度及鑑別度上的控制堪稱妥善。(二)在觀察個別試題其鑑別度低於0.30歷年統計上,共發現59題試題,其中以題型而言,第二大題綜合測驗(Multiple-Choice Cloze)出現最多鑑別度不佳之試題,高達30題;以考試舉辦年度言,又以93年及96年出現最多試題鑑別度堪慮之題目,而題目之配分佔選擇題總分高達21%;(三)鑑別度欠妥之試題在進行試題分析時發現下列特徵:(1)試題較難導致學生在測驗表現上,未能清楚界定而有考試公平性之憂;(2)測驗題項和所要檢定或觀察之特徵混雜,導致觀察質不一引響評分公平;(3)指考試題內容及題材與試題分析所考量之公平議題相關,這些議題包含考試題材的內容、試題討論或含括之主題、應考考生的社經背景及其個人所具備之背景知識等,均會對考試公平性產生引響;(四)本研究取得之入學成績樣本自95迄102學年,其入學學測與指考英語成績在經過成對樣本考驗後發現,這8年來均顯現出相關,說明這8年來大考中心在試題工作上,即便經過了調整試題配分、考題數量亦或面臨家長對不同教科書選用所造成的考試不公疑慮,均得到了統計上相關的解釋。
文中最後針對研究問題所發現之研究成果,提出四項對教學上的啟發以及針對日後相關研究提出建言。
This study aims to investigate discrimination and bias in English test items of the JCEE (Joint College Entrance Examination) in the last 11 years (2003 - 2013). It first studies difficulty levels (P-values) and discrimination powers (D-values), item-by-item, section-by section. Then items whose D-values are under 0.30 are marked and tallied in terms of item types, occurrence and share of the total scores. Those items are further explored by item analyses via a qualitative approach to see which fairness concerns have been violated. Finally subjects’ scores from the two JCEE tests are compared and calculated by the paired-sample t-test through the use of the statistics software, SPSS, to see whether there is a significant correlation.
Findings are summarized as follows:
(1) A chronological survey of the P-values of English items in the JCEE shows their comparable ease, with a mean equal to 0.52 and D-values equal to 0.51. That implies that items in this period of the study were designed properly for targeting at the acceptable index and also combining discriminating forces.
(2) There are 59 items whose D-values are under 0.30. Among them, Section Two, the Cloze Test, has the highest count with 30 out of 59 flaws, while Section Four has the lowest count with only 2 flaws out of 59. In terms of the most occurrences of the Year, Years 2004 and 2007 win the title with items in both years occupying a 21% share of the test scores.
(3) Those items above possess the following features: (a) They are really difficult items thus misgauging test takers’ performance, which arouses test fairness concerns. (b) Mixed responses are required for test-takers in order for them to answer correctly. (c) Item topic or testing materials are involved in certain fairness factors. Factors like socioeconomic status, topical knowledge, experiential activities or even part of a student’s background knowledge are effects that may make a test unfair.
(4) The statistical test on the correlations was done by a comparison between the JCEE Test 1 and Test 2. The results show that they are statistically significant. This means that the subjects’ scores on Test 1 and Test 2 of the JCEE are closely related.
From the findings mentioned above, four pedagogical implications were derived and established. Suggestions for further research are provided at the end. It is hoped that by the item analyses, measures ensuring and protecting fairness can be adapted from the beginning of the item design, to the selection of test materials and the final interpretation of the test results.
REFERENCES
English References

Bachman, L., &; Purpura, J. (2008). Language assessments: Gate-keepers or door-openers? In B. M. Spolsky &; Francis M. Hult (Eds.), Blackwell handbook of educational linguistics. Oxford, UK: Blackwell.

Birmbaum, A. (1968). Some latent trait models and their use in inferring an examinee’s ability. In F. M. Lord, and M. R. Novik, (Eds.), Statistical Theories of Mental Test Scores. MA: Addison-Wesley.

Bock, R. D., Mislevy, R. J., &; Woodson, C. E. M. (1982). The next stage in educational assessment. Educational Researcher, 11, 4-11, 16.

Choppin, B. (1976). Recent development in item banking. In D. N. de Gruijter &; L. J. van der Kamp (Eds.), Advances in psychological and educational measurement. London: Wiley.

Ebel, R. L. (1991). Essentials of educational measurement. N.J. : Prentice Hall.

Ebel, R. L. &; Frisbie, D. A. (1991). Essentials of educational measurement (5th Ed.). NJ: Prentice-Hall.

Gullikson, H. (1987). Theory of mental tests. NJ: Lawrence Erlbaum Associates.

Guion, R. M., &; Ironson, G. H.(1983).Latent trait theory for organizational research. Organizational Behavior and Human Performance, 31, 54-87.

Hambleton, R. K. (1979). Latent trait models and their applications. In R. Traub, (Ed.), New Directions for Testing and Measurement (Volume 4.) Methodological Developments. SF: Jossey Bass.

Hambleton, R. K. and Cook, L. L. (1977). Latent trait models and their use in the analysis of educational test data. Journal of Educational Measurement, 14, 2, 75-96.new window

Holland, P. W. and Wainer, H. (Eds.). (1993). Differential item functioning. NJ: Lawrence Erlbaum Associates.

Kunnan, A. J. (2000). Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida. Cambridge: CUP.

Lohman, D. F. &; Snow, R. E. (1993). Cognitive psychology, new test design, and new test theory: an introduction. In Frederiken, N., Mislevy, R. J. &; Bejar, I. I, (Eds). Test theory for a new generation of tests. NJ: Lawrence Erlbaum Associates.new window

Lord, F. M. (1952). A theory of test scores. Psychometrika, 24, 1-18.new window

Lord, F. M. (1980). Application of item response theory to practical testing problems. NJ: NJ: Lawrence Erlbaum Associates.

Lord, F. M., &; Novick, M. R. (1968). Statistical theories of mental test scores.new window
MA: Addison-Wesley.

McDonald,R. P. (1999). Test theory: A unified treatment. NJ: Lawrence Erlbaum Associates.

Messick, S., Beaton, A. E., &; Lord, F. M. (1983). National assessment of Educational Progress Reconsidered: A new design for a new era. NJ: NAEP.

Noll, V. H., Scannell, D. P., &; Craig, R. C. (1979). Introduction to educational measurement (4th Ed.). MA: Houghton Mifflin.

Penfield, R. D. and Lam, T. C. M. (2000). Assessing differential item functioning in performance assessment: review and recommendations. Educational Measurement: Issues and Practice, 11(3), 5-15.

Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute of Education Research.

Richards, J.C., Platt, J. &; Platt, H. (1992). Longman dictionary of language teaching and applied linguistics. UK: Longman Group.

Shohamy, E. (2000). Fairness in testing. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: papers from the 19th Language Testing Research Colloquium, Orlando , Florida (pp. 15-19). Cambridge: CUP.

Spearman, C. (1904). The proof and measurement of association between two things. American Journal of Psychology, 15, 72-101.

Spearman, C. (1907). Demonstration of formulae for true measure of correlation. American Journal of Psychology, 18, 161-169.

Spearman, C. (1910). Correlation calculated with faulty data. British Journal of Psychology, 3, 271-295.

Theunissen, T. J. J. M. (1985). Binary programming and test design. Psychometrika, 50, 411-420.new window

Tucker, L. R. (1946). Maximum validity of a test with equivalent items. Psychometrika, 11, 1-13.new window

Tung, H. C. (2008). Historical developments of the English tests used in Joint College Entrance Examination in the past fifty years. Unpublished Dissertation at National Kaohsiung Normal University.

Weiss, D. J. (1984). Application of computerized adaptive testing to educational problem. Journal of Educational Measurement, 21, 361-376.

Wright, B. D. (1977). Solving measurement problems with the Rasch model. Journal of Educational Measurement, 14, 97-116.

Chinese References

陳柏熹. (2006). IRT在測驗編制上的應用. Retrieved from http://www.bctest.ntnu.edu.tw/issue.htm


Bibliography

Airasian, P.W., &; Madaus, G.F. (1983). Linking testing and instruction: Policy issues. Journal of Educational Measurement, 20(2), 103-118.

Alderson, J. C. and L. Hamp-Lyons, (1996). TOEFL preparation courses: A study of washback. Language Testing 13: 280–297.

Alderson, J. C. and A. H. Urquhart (1985a). The effect of students’ academic discipline on their performance on ESP reading tests. Language Testing 2:192–204.

Angelis, P. J.(1982). Academic needs and priorities for testing. American Language Journal, 1, 41-56.

Bachman, L. F. (1990) Fundamental Considerations in Language Testing. Oxford: Oxford University Press.

Bachman, L. F. and D. Eignor (1998). Recent advances in quantitative test analysis. In Bersoff, D. (1984). Social and legal influences on test development and usage. In B. Plake (Ed.) Social and Technical Issues in Testing: 87–109.new window

Campbell, P.B. (1989). The Hidden Discriminator: Sex and Race Bias in Educational Research. Groton, MA: Women's Educational Equity Act Program. ERIC Document Reproduction Service No. ED 322 174.

Camilli, G. (1993). The case against item bias detection techniques based on internal criteria: Do item bias procedures obscure test fairness issues? In (Holland and Wainer) 397–413.new window

Canale, M. &; Swain, M. (1980). Theoretical bases of communicative approaches to second language teaching and testing. Applied Linguistics 1, 1-47.

Chen, Z. and G. Henning. (1985). Linguistic and cultural bias in language proficiency tests. Language Testing 2: 155-163.

Green, A. (1997). Verbal Protocol Analysis in Language Testing Research. Cambridge: Cambridge University Press.

Hale, G. (1988). Student major field and text content: Interactive effects on reading comprehension in the TOEFL. Language Testing, 5: 49–61.

Huang, T. S. (1997). A qualitative analysis of The JCEE English tests. Taipei: The
Crane Publishing Co., Ltd.

Hughes A. &; Porter, D. (1983). Current developments in language testing. London: Academic Press.

Klein, S.S. (Ed.) (1985). Handbook for Achieving Sex Equity through Education. Baltimore, MD: Johns Hopkins University Press. ERIC Document Reproduction Service No. ED 290 810.

Kunnan, A. J. (1992). An investigation of a criterion-referenced test using G-theory, and factor and cluster analysis. Language Testing 9: 30–49.new window

Kunnan, A. J. (1995). Test-taker characteristics and Test Performance: A Structural Modeling Approach. Cambridge: Cambridge University Press.

Madsen, H. S. (1983). Techniques in testing. NY: Oxford University Press.

Rosser, P. (1989). The SAT Gender Gap: Identifying the Causes. Washington, DC: Center for Women Policy Studies. ERIC Document Reproduction Service No. ED 311 087.

Ryan, K. and L. F. Bachman (1992). Differential item functioning on two tests of EFL proficiency. Language Testing 9: 12–29.

Shohamy, E. and O. Inbar (1991) Construct validity of listening comprehensive test of oral proficiency. Language Testing 8: 23-40new window

Tittle, C.K. (1979). What to Do About Sex Bias in Testing. Princeton, NJ: ERIC Clearinghouse on Tests, Measurement, and Evaluation. ERIC Document Reproduction Service No. ED 183 628.

Wiseman, S. (1961). Examinations and English education. Manchester, England: Manchester University Press.

Yu, K. H. (1983). Language proficiency and its assessment: What is in the old bottle and what is new? English Teaching &; Learning, 11(1), 39-50.

Zeidner, M. (1986). Are English language aptitude tests biased towards culturally different minority groups? Some Israeli findings. Language Testing 3: 80–95.

Zeidner, M. (1987). A comparison of ethnic, sex and age biases in the predictive validity of English language aptitude tests: Some Israeli data. Language Testing 4: 55–71.

 
 
 
 
第一頁 上一頁 下一頁 最後一頁 top