REFERENCES
American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, & Psychological Testing (US). (2014). Standards for educational and psychological testing. Amer Educational Research Assn.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In: R. L. Thorndike (Ed.), Educational Measurement (pp. 508-600). Washington, DC: American Council on `Education.
Brandon, P. R. (2004). Conclusions about frequently studied modified Angoff standard-setting topics. Applied Measurement in Education, 17(1), 59–88.
Bond, T. G., & Fox, C. M. (2001). Applying the Rasch Model: Fundamental Measurement in Human Sciences. Mahwah, NJ: Erlbaum.
Buckendahl, C. W., Smith, R. W., Impara, J. C., & Plake, B. S. (2002). A comparison of Angoff and Bookmark standard setting methods. Journal of Educational Measurement, 39(3), 253-263.
Chi, M. T. H., Glaser, R., & Farr, M. J. (Eds.). (1988). The Nature of Expertise. Hillsdale, NJ: Erlbaum.
Cizek, G. J. (1996). An NCME instructional module on setting passing scores. Educational Measurement: Issues and Practice, 15(2), 20-31.
Cizek, G. J. (2001). Conjectures on the rise and call of standard setting: An introduction to
context and practice. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives, (pp. 3-17). Routledge.
Cizek, G. J. (Ed.). (2012). Setting performance standards: Foundations, methods, and
innovations. Mahwah. NJ: Erlbaum.
Cizek, G. J. (2012a). The forms and functions of evaluations in the standard setting
process. In G. J. Cizek (Ed.), Setting performance standards: Foundations, methods, and innovations, (pp. 165-178). NJ: Erlbaum.
Cizek, G. J., & Bunch, M. B. (2007). Standard Setting: A Guide to Establishing and Evaluating Performance Standards on Tests. Thousand Oaks, CA: Sage.
Cizek, G.J., Bunch, M.B., & Koons, H. (2004). Setting performance standards: Contemporary methods. Educational Measurement: Issues and Practice, 23(4), 31–50.
Clauser, J. C., Margolis, M. J., & Clauser, B. E. (2014). An examination of the replicability of Angoff standard setting results within a generalizability theory framework. Journal of Educational Measurement, 51(2), 127-140.
Clauser, B. E., Mee, J., Baldwin, S. G., Margolis, M. J., & Dillon, G. F. (2009). Judges' use of examinee performance data in an Angoff standard‐setting exercise for a medical licensing examination: An experimental study. Journal of Educational Measurement, 46(4), 390-407.
Clauser, B. E., Mee, J., & Margolis, M. J. (2013). The effect of data format on integration of performance data into Angoff judgments. International Journal of Testing, 13(1), 65-85.
Clauser, B. E., Swanson, D. B., & Harik, P. (2002).A multivariate generalizability analysis of the impact of training and examinee performance information on judgments made in an Angoff-style standard-setting procedure. Journal of Educational Measurement, 39(4), 269–290.
Council of Europe. (2001). Common European framework of reference for languages. Cambridge: Cambridge University Press.
Council of Europe. (2009). Manual for relating language examinations to the Common
Common European Framework of References for Language Learning, Teaching, Assessment. Cambridge: Cambridge: Cambridge University Press.
Crocker, L. & Zieky, M. (1994). Joint Conference Standard Setting for Large-Scale Assessments. National Assessment Governing Board. Washington, D.C.
Cronbach, L. J. (1988). Five perspectives on validation argument. In H. Wainer & H. Braun (Eds.), Test Validity (pp. 3–17). Hillsdale, NJ: Erlbaum.
Cronbach, L. J., & Meehl, P. E. (1955). Construct validity in psychological tests. Psychological Bulletin, 52(4), 281–302.
Cross, L. H., Impara, J. C., Frary, R. B., & Jarger, R. M. (1984). A comparison of three methods on the National Teacher Examination. Journal of Educational Measurement, 21(2), 113- 129.
Egan, S. J., Dick, M., & Allen, P. J. (2012). An experimental investigation of standard setting in clinical perfectionism. Behaviour Change, 29(3), 183-195.
Elman, B. A. (2000). A cultural history of civil examinations in late imperial China. University of California Press.
Embretson, S.E. and Reise, S.P. (2000). Item response theory for psychologists. Mahwah, NY: Lawrence Erlbaum Associates.
Engelhard, G. (2007). Evaluating bookmark judgments. Rasch measurement Transactions, 21, 1097-1098.
Engelhard, G. and Anderson, D. W. (1998). A binomial trials model for examining the ratings of standard setting judges. Applied Measurement in Education, 11(3), 209-230.
Fitzpatrick, A. R. (1989). Social influences in standard setting: The effects of social interaction on group judgments. Review of Educational Research, 59(3), 315-328.
George, S., Haque, M. S., & Oyebode, F. (2006). Standard setting: comparison of two methods. BMC Medical Education, 6(1), 46.
Glaser, R. (1963). Instructional technology and the measurement of learning outcomes. American Psychologist, 18(8), 519–522.
Goodwin, L.D. (1999). Relations between observed item difficulty levels and Angoff minimum passing levels for a group of minimally competent examinees. Applied Measurement in Education, 12(1), 13-28.
Green, D. R., Trimble, C. S., & Lewis, D. M. (2003). Interpreting the results of three different
standard setting procedures. Educational Measurement: Issues and Practice, 22(1), 22–32.
Halpin, G., Sigmon, G., & Halpin, G. (1983). Minimum competency standards set by three divergent groups of raters using a three judgmental procedures: Educational and Psychological Measurement, 47(1), 977-983.
Hambleton, R. K. (1980). Test score validity and standard-setting methods. Criterion-referenced measurement: The state of the art, 80, 123.
Hambleton, R. K. (2001). Setting performance standards on educational assessments and criteria for evaluating the process. In Cizek G. J. (Ed.), Setting performance standards: Concepts, methods, and perspectives, (pp. 89-116).
Hambleton, R. K., Pitoniak, M. J., & Copella, J. M. (2012). Essential steps in setting performance standards on educational tests and strategies for assessing the reliability of results. In Cizek G. J. (Ed.), Setting performance standards: Foundations, methods, and innovations (2nd ed., pp. 47–76). New York, NY: Routledge.
Pitoniak, M. J. (2006). Setting performance standards. Educational Measurement, 4, 433-470.
Hertz, N. R., & Chinn, R. N. (2002, April). The role of deliberation style in standard setting for licensing and certification examinations. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
Holden, R. (2010). "Face validity". In Weiner, Irving B.; Craighead, W. Edward. (Eds,),The Corsini Encyclopedia of Psychology (4th ed).(pp. 637-638). Hoboken, New Jersey: Wiley.
Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22(4), 679-688.
Hurtz, G. M., & Auerbach, M. A. (2003). A meta-analysis of the effects of modifications to the Angoff method on cutoff scores and judgment consensus. Educational and
Psychological Measurement, 63(4), 584–601
Huynh, H. & Schneider, C. (2005). Vertically moderated standards: Background, assumptions, and practices. Applied Measurement in Education, 18(1), 99-113.
Impara, J.C., & Plake, B.S. (1998). Teachers’ ability to estimate item difficulty: A test of the assumptions in the Angoff standard-setting method. Journal of Educational Measurement, 35(1), 69-81.![new window](/gs32/images/newin.png)
Jaeger, R. M. (1991). Selection of judges for standard‐setting. Educational Measurement: Issues and Practice, 10(2), 3-14.
Johnson, E. J. (1988). Expertise and decision under uncertainty: Performance and process. In M. Chi, R. Glaser, & M. J. Farr (Eds.), The Nature of Expertise. (pp. 209-228). Hillsdale, NJ: Lawrence Erlbaum Associates.
Kaftandjieva, F. (2010). Methods for Setting Cut Scores in Criterion-references Achievement Tests. A Comparative Analysis of Six Recent Methods with an Application to Tests of Reading in EFL. EALTA publication. Retrieved March 25, 2013 from http://www.ealta.eu.org/documents/resources/FK_second_doctorate.pdf
Kane, M. T. (2006). Validation. Educational Measurement, 4(2), 17-64.
Kane, M. T. (2001). So much remains the same: conception and status of validation in setting standards. In G. J. Cizek (Ed.), Setting performance standards: concepts, methods and perspectives (pp. 19–51). Mahwah, NJ: Lawrence Erlbaum Associates, Inc.
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112(3), 527-535.
Larkin, J. H., McDermott, J., Simon, D. P., & Simon, H. A. (1980). Expert and novice performance in solving physics problems. Science, 208, 1335-1342.
Lavallee, J. (2012). Validation Issues in an Angoff Standard Setting: A Facets-based investigation. Unpublished PhD Dissertation, Department of Counseling and Educational Psychology, National Taiwan Normal University, Taipei, Taiwan.
Linn, R. L. (2003). Accountability: Responsibility and reasonable expectations. Educational Researcher, 32, 3-13.
Linn, R. L., Baker, E. L., & Betebenner, D. W. (2002). Accountability systems: Implications of requirements of the No Child Left Behind Act of 2001. Educational Researcher, 31, 3–16.
Linn, R. L., & Shepard, L. A. (1997). Item-by-item standard setting: Misinterpretations of judge’s intentions due to less than perfect item inter-correlations. In Council of Chief
State School Officers National Conference on Large Scale Assessment, Colorado Springs, CO.
Lissitz, R. W. & Huynh H. (2003). Vertical equating for state assessments: Issues and solutions in determination of adequate yearly progress and school accountability. Practical Assessment, Research & Evaluation, 8(10). Retrieved March 25, 2012 From http://pareonline.net/getvn.asp?v=8&n=10
Lissitz, R. W. & Wei, H. (2008).Consistency of standard setting in an augmented state testing system. Educational Measurement, 27(2), 46-56.
Loevinger, J. (1957). Objective tests as instruments of psychological theory. Psychological Reports, 3(3), 635–694.
Loomis, S. C. (2012). Selecting and Training Standard Setting Participants. Setting performance standards: Foundations, methods, and innovations, 107-134.
Lorge, L, & Kruglov, L. (1953). A suggested technique for the improvement of difficulty prediction of test items. Educational and Psychological Measurement, 12(4), 554-561.![new window](/gs32/images/newin.png)
McGinty, D. (2005). Illuminating the “Black Box” of standard setting: An exploratory qualitative study. Applied Measurement in Education, 18(3), 269–287.
Margolis, M. J., & Clauser, B. E. (2014). The Impact of Examinee Performance Information on Judges’ Cut Scores in Modified Angoff Standard‐Setting Exercises. Educational
Measurement: Issues and Practice, 33(1), 15-22.
Mee, J., Clauser, B. E., & Margolis, M. J. (2013). The impact of process instructions on judges’ use of examinee performance data in Angoff standard setting exercises. Educational Measurement: Issues and Practice, 32(3), 27-35.
Messick, S. (1981). Constructs and their vicissitudes in educational and psychological measurement. Psychological Bulletin, 89(3), 575–588.
Messick, S. (1989).Validity. In R. L. Linn (Ed.), Educational measurement (pp.13–103). Washington, DC: American Council on Education and National Council on Measurement in Education.
Messick, S. (1998). Test validity: A matter of consequence. Social Indicators Research, 45(1-3),
35–44.
Michigan State Department of Education. (February, 2007). Retrieved from http://www.michigan.gov/documents/mde/MI-ELPA_Tech_Report_final_199596_7.pdf
Mitzel, H. C., Lewis, D. M., Patz, R. J., & Green, D. R. (2001). The Bookmark procedure: Psychological perspectives. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives (pp. 249-281). Mahwah, NJ: Erlbaum.
National Council for Measurement in Education. (2015). Retrieved from http://www.ncme.org/ncme/NCME/Resource_Center/Glossary/NCME/Resource_Center/Glossary1.aspx?hkey=4bb87415-44dc-4088-9ed9-e8515326a061#anchorV
Nedelvsky, L. (1954). Absolute grading standards for objective tests. Educational and Psychological Measurement, 14(2), 3-19.
Nelson, D. S., (1994). Job analysis for licensure and certification exams: science or politics? Educational Measurement: Issues and Practice, 13(3), 29-35.
Norcini, J., Lipner, R., Langdon, L., & Strecker, C. (1987). A comparison of three variations on a standard-setting method. Journal of Educational Measurement, 24(1), 56-64.
Norcini, J. J. & Shea, J. A. (1997). The credibility and comparability of standards. Applied Measurement in Education, 10(1), 39–59.
Plake, B., & Giraud, G. (1998). Effect of a modified Angoff strategy for obtaining item performance estimates in a standard setting study. Paper presented at the Annual Meeting of the American Educational Research Association. San Diego, Calf.
Plake, B. S., Melican, G. J., & Mills, C. N. (1991). Factors Influencing Intrajudge Consistency During Standard‐Setting. Educational Measurement: Issues and Practice, 10(2), 15-16.
Raymond, M. R., & Reid, J. B. (2001). Who made thee a judge? Selecting and training Participants for standard setting. In G. J. Cizek (Ed.), Setting performance standards: Concepts, methods, and perspectives, (pp. 119-157).
Reckase M. D.(2000). The ACT/NAGB standard setting process: How "modified" does it have to be before it is no longer a modified-Angoff process? Paper presented at the annual meeting of the American Educational Research Association, New Orleans.
Reckase, M. D. (2006) Rejoinder: Evaluating standard setting methods using error models proposed by Schulz. Educational Measurement, 25(3), 14-17.
Roach, A. T., McGrath, D., Wixon, C., & Talapatra, D. (2010). Aligning an early childhood assessment to state kindergarten content standards: application of a nationally recognized alignment framework. Educational Measurement: Issues and Practice, 29(1), 25-37.
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88(2), 413-428.
Schafer, W. D. (2005). Criteria for standard setting from the sponsor’s perspective. Applied Measurement in Education, 18(1), 61-81.
Schoonheim‐Klein, M., Muijtjens, A., Habets, L., Manogue, M., Van Der Vleuten, C., & Van der Velden, U. (2009). Who will pass the dental OSCE? Comparison of the Angoff and
the borderline regression standard setting methods. European Journal of Dental Education, 13(3), 162-171.
Shepard, L.A. (1980). Standard setting issues and methods. Applied Psychological Measurement, 4(4), 447-467.
Shepard, L. A. (1994). Implications for standard setting of the National Academy of Educational Evaluation of the National Assessment of Educational Progress achievement levels. In: Proceedings of the joint conference on standard setting for large-scale assessments of the
National Assessment Governing Board and the National Center for Educational Statistics (pp. 143–159). Washington, DC: U.S. Government Printing Office.
Smith, R. L. and Smith, J. S. (1988). Differential use of item information by judges using Angoff and Nedelsky procedures. Journal of Educational Measurement, 25(4), 259-274.
Taube, K.T. (1997). The incorporation of empirical item difficulty data in the Angoff standard-setting procedure. Evaluation and the Health Professions, 20(4), 479-498.
Taylor, J. (2014, July 17). Difference Between Within-Subject and Between-Subject [Blog] Retrieved from http://www.statsmakemecry.com/smmctheblog/within-subject-and-between-subject-effects-wanting-ice-cream.html
van de Watering, G., & van der Rijt, J. (2006). Teachers’ and students’ perceptions of assessments: A review and a study into the ability and accuracy of estimating the
difficulty levels of assessment items. Educational Research Review, 1(2), 133-147.
Verhoeven, B. H., Verwijnen, G. M., Muijtjens, A. M. M., Scherpbier, A. J. J. A., & Van der Vleuten, C. P. M. (2002). Panel expertise for an Angoff standard setting procedure in progress testing: item writers compared to recently graduated students. Medical Education, 36(9), 860-867.
Wessen, C. (2010). Analysis of Pre- and Post-Discussion Angoff ratings for evidence of social influence effects. Unpublished MA Dissertation, Department of Psychology, University of California, Sacramento.
Wiley, A., & Guille, R. (2002). The occasion effect for “at-home” Angoff ratings. Paper presented at the annual meeting of the National Council on Measurement in Education, New Orleans, LA.
Yin, P. & Schultz, E. M. (2005). A comparison of cut scores and cut score variability from Angoff-based and Bookmark-based procedures in standard setting. Paper presented at the annual meeting of the National Council on Measurement in Education, Montreal, Canada.