Threats to validity in assessment: implications in medical education

Blanca Ariadna Carrillo Avalos; Melchor Sánchez Mendiola; Iwin Leenen

2020, Number 34

<< Back Next >>

Inv Ed Med 2020; 9 (34)

Threats to validity in assessment: implications in medical education

Carrillo ABA, Sánchez MM, Leenen I

Full text

How to cite this article

Language: Spanish
References: 39
Page: 100-107
PDF size: 555.09 Kb.

ABSTRACT

Validity threats in educational assessment are elements that interfere with the proposed interpretation of a test score. They can occur in written tests as well as in performance and clinical competency assessments. They are usually grouped in two major categories: construct underrepresentation and construct-irrelevant variance. The former refers to tests with insufficient items, cases, or observations to make a proper generalization towards the full to-be-assessed domain. The latter is related to the presence of biases that can interfere systematically with the interpretation of a test score, such as item quality and raters’ systematic errors, among other factors that may have an effect on the obtained score. In this paper we describe the characteristics of some of these threats, their importance, and some recommendations to avoid them during the development of assessment instruments in health sciences education. The insights offered can be useful to devise tests and assessment instruments that allow us to draw more valid inferences about students’ knowledge and abilities.

REFERENCES

Cronbach LJ. Five perspectives on validity argument. En: Wainer H, Braun HI, editores. Test validity [Internet]. New York: Routledge; 1988. p. 3-17. Disponible en: https://doi. org/10.4324/9780203056905
Downing SM, Haladyna TM. Validity threats: Overcoming interference with proposed interpretations of assessment data. Med Educ. 2004;38(3):327-33.
Downing SM, Yudkowski R, editores. Assessment in health professions education. New York and London: Routledge; 2009. 317 p.
Carrillo BA, Sánchez M, Leenen I. El concepto moderno de validez y su uso en educación médica. Inv Ed Med. 2020; 9(33):98-106.
Norman G, van der Vleuten C, Newble D. International Handbook of Research in Medical Education. Norman G, van der Vleuten C, Newble D, editores. Springer; 2002. 1106 p.
Jozefowicz RF, Koeppen BM, Case S, Galbraith R, Swanson D, Glew RH. The quality of in-house medical school examinations. Acad Med. 2002;77(2):156-61.
Ware J, Vik T. Quality assurance of item writing: During the introduction of multiple choice questions in medicine for high stakes examinations. Med Teach. 2009;31(3):238-43.
Tarrant M, Knierim A, Hayes SK, Ware J. The frequency of item writing flaws in multiple-choice questions used in high stakes nursing assessments. Nurse Educ Today. 2006; 26(8):662-71.
Downing SM. Threats to the validity of locally developed multiple-choice tests in medical education: Construct-irrelevant variance and construct underrepresentation. Adv Heal Sci Educ. 2002;7(3):235-41.
Crooks TJ, Kane MT, Cohen AS. Threats to the valid use of assessments. Assess Educ Princ Policy Pract. 1996;3(3):265- 85.
Messick S. Validity. En: Linn RL, editor. Educational Measurement [Internet]. New York: Macmillan; 1989. p. 13- 103. Disponible en: https://onlinelibrary.wiley.com/doi/ abs/10.1002/j.2330-8516.1987.tb00244.x
Schuwirth LWT, Van Der Vleuten CPM. General overview of the theories used in assessment: AMEE Guide No. 57. Med Teach. 2011;33(10):783-97.
De Champlain AF. A primer on classical test theory and item response theory for assessments in medical education. Med Educ. 2010;44(1):109-17.
Haladyna TM, Downing SM. Construct-Irrelevant Variance in High-Stakes Testing. Educ Meas Issues Pract [Internet]. 2004;23(1):17-27. Disponible en: https://onlinelibrary.wiley. com/doi/abs/10.1111/j.1745-3992.2004.tb00149.x
Leenen I. Virtudes y limitaciones de la teoría de respuesta al ítem para la evaluación educativa en las ciencias médicas. Inv Ed Med. 2014;3(9):40-55.
Downing SM. Reliability : on the reproducibility of assessment data. Med Educ. 2004;38:1006-12.
Miller GE. The assessment of clinical skills/competence/ performance. Acad Med. 1990;65(9):S63-7.
Hawkins RE, Margolis MJ, Durning SJ, Norcini JJ. Constructing a validity argument for the mini-clinical evaluation exercise: A review of the research. Acad Med. 2010;85(9):1453-61.
Moore K, Dailey A, Agur A. Anatomía con orientación clínica. 7a ed. Philadelphia: Wolters Kluwer Health, S.A., Lippincot Williams & Wilkins; 2013.
National Board of Medical Examiners. Cómo elaborar preguntas para evaluaciones escritas en el área de ciencias básicas y clínicas. 4th ed. Paniagua MA, Swygert KA, editores. Philadelphia, PA: National Board of Medical Examiners; 2016. 100 p.
Moreno R, Martínez RJ, Muñiz J. Directrices para la construcción de ítems de elección múltiple. Psicothema [Internet]. 2004;16(3):490-7. Disponible en: https://www.redalyc.org/ articulo.oa?id=72716324
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. STANDARDS for Educational and Psychological Testing. 6th ed. American Educational Research Association. Washington, D. C.: American Educational Research Association, American Psychological Association & National Council on Measurement in Education; 2014. 243 p.
Williams BW, Byrne PD, Welindt D, Williams M V. Miller’s pyramid and core competency assessment: A study in relationship construct validity. J Contin Educ Health Prof. 2016;36(4):295-9.
Pangaro L, Ten Cate O. Frameworks for learner assessment in medicine: AMEE Guide No. 78. Med Teach. 2013;35:e1197- e1210.
Hadie SNH. The Application of Learning Taxonomy in Anatomy Assessment in Medical School. Educ Med J. 2018;10(1):13-23.
Haladyna TM, Downing SM, Rodriguez MC. A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment. Appl Meas Educ. 2002;15(3):309-34.
Downing SM. Construct-irrelevant variance and flawed test questions: Do multiple-choice item-writing principles make any difference? Acad Med. 2002;77(10 SUPPL.):103-4.
Downing SM. The effects of violating standard item writing principles on tests and students: The consequences of using flawed test items on achievement examinations in medical education. Adv Heal Sci Educ. 2005;10(2):133-43.
Abad FJ, Olea J, Ponsoda V. Analysis of the optimum number alternatives from the Item Response Theory. Psicothema. 2001;13(1):152-8.
Rodriguez MC. Three options are optimal for multiple-choice items: A meta-analysis of 80 years of research. Educ Meas Issues Pract. 2005;24(2):3-13.
Haladyna TM, Rodriguez MC, Stevens C. Are Multiple-choice Items Too Fat? Appl Meas Educ [Internet]. 2019;32(4):350- 64. Disponible en: https://doi.org/10.1080/08957347.2019.16 60348
Hicks NA. Guidelines for identifying and revising culturally biased multiple-choice nursing examination items. Nurse Educ. 2011;36(6):266-70.
Chiavaroli N. Negatively-worded multiple choice questions: An avoidable threat to validity. Pract Assessment, Res Eval. 2017;22(3):1-14.
Gómez-Benito J, Sireci S, Padilla JL, Dolores Hidalgo M, Benítez I. Differential item functioning: Beyond validity evidence based on internal structure. Psicothema. 2018;30(1):104-9.
Young JW. Ensuring valid content tests for English Language Learners. Educational Testing Service. 2008.
Wong S, Yang L, Riecke B, Cramer E, Neustaedter C. Assessing the usability of smartwatches for academic cheating during exams. En: Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI 2017. Association for Computing Machinery; 2017.
Bond L. Teaching to the Test: Coaching or Corruption. New Educ. 2008;4(3):216-23.
Lane S, Raymond M, Haladyna T. Handbook of Test Development [Internet]. 2nd ed. Lane S, Raymond M, Haladyna T, editores. International Journal of Testing. New York: Routledge; 2016. 676 p. Disponible en: http://www.tandfonline. com/doi/abs/10.1080/15305050701813433
Jurado A, Leenen I. Reflexiones sobre adivinar en preguntas de opción múltiple y cómo afecta el resultado del examen. Inv Ed Med. 2016;5(17):55-63.