Implementation and validation of a Computerized Adaptative Testing for general surgery certification by the Mexican Board of General Surgery: from the proof of concept to its definitive validation

Eduardo Prado; Rafael Humberto Pérez-Soto; Karina Sánchez-Reyes; Elena López-Gavito; Armando Hernández-Cendejas; Jorge Kobeh; David Velázquez-Fernández

2026, Number 1

<< Back Next >>

Cir Gen 2026; 48 (1)

Implementation and validation of a Computerized Adaptative Testing for general surgery certification by the Mexican Board of General Surgery: from the proof of concept to its definitive validation

Prado E, Pérez-Soto RH, Sánchez-Reyes K, López-Gavito E, Hernández-Cendejas A, Kobeh J, Velázquez-Fernández D

Full text

How to cite this article

10.35366/123064

Language: Spanish
References: 15
Page: 7-17
PDF size: 1627.90 Kb.

ABSTRACT

Introduction: the board certification in general surgery requires the full compliance of 3 phases (pilot test, implementation, validation), designed to guarantee the quality standards of professional competence in our country. CAT is a tool that allows the optimized application of items in a dynamic way based on their performance in real time, optimizing the time, number of items, computer equipment and human resources plus reducing the potential fatigue associated with longer periods of time. Objective: to implement and analyze the efficiency of CAT to determine the academic proficiency of general surgeons who apply for the CMCG certification. Material and methods: we used a three-phase methodology: a pilot test, with the test implementation and a final validation in independent cohorts. The first included 322 supporters, the second was applied to 569 independent applicants in which the results from our conventional test were contrasted with the CAT, while for the validation phase 1,194 applicants with two different CATs were analyzed with randomly drawn items obtained from a pool of 1,200 items with different levels of difficulty that were contrasted with the conventional test of each sustainer in a paired and global analysis. For statistical analysis we used IBM SPSS^® Statistics^® v26 software considering any p value < 0.05 as statistically significant for a two-tie hypothesis test. Results: RACs resulted similar between the conventional test and CAT (30.32 ± 3.89 vs 30.67 ± 6.83 respectively) with a delta of 0.35 ± 7.26; p = 0.38 and SEM = 0.40. For the implementation phase, a shorter time was documented between groups (117.2 ± 12.6 vs 72.05 ± 18.2 minutes, respectively), with a lower average of answered items (100.56 ± 15.64 vs 193.69 ± 15.78), but with a similar average score (50.84 ± 7.9 vs 54.53 ± 7.2), but able to discern between AAA vs BAA applicants (51.34 ± 7.5 vs 36.42 ± 5.5). In the validation phase, there was no difference in time, number of items answered, or difficulty grade, but the ability to discriminate between AAA and BAA applicants was maintained (p < 0.0001). Conclusions: setting CAT proved to be a useful tool not only feasible, but also valid with greater efficiency than the conventional tests, achieving comparable results, but with a significant reduction in the time and number of items utilized, maintaining its ability to quickly discriminate the applicants with greater academic assertiveness.

REFERENCES

Zermeño-Gómez MG, Kobeh-Jirash JA, Moreno-Guzmán A, Jiménez-Chavarría E, Pantoja-Millán JP, Noyola-Villalobos H, et al. La certificación en Cirugía General a 42 años de la fundación del Consejo Mexicano de Cirugía General. Cir Gen. 2016; 41: 314-321.
Morris S, Bass M, Lee M, Neapolitan RE. Advancing the efficiency and efficacy of patient reported outcomes with multivariate computer adaptative testing. J Am Med Inform Assoc. 2017; 24: 867-902.
Bamikole OI. Item Response Theory (IRT): A Modern Statistical Theory for Solving Measurement Problem in 21st Century. International Journal of Scientific Research in Education, 2018; 11: 627-635.
Linacre JM. Computer-adaptive testing: a methodology whose time has come. South Korea: Komesa Press; 2000. Disponible en: https://www.rasch.org/memo69.pdf
Cicciola E, Foschi R, Lombardo GP. Making up intelligence scales: De Sanctis's and Binet's tests, 1905 and after. Hist Psychol. 2014; 17: 223-236.
Adams NE. Bloom's taxonomy of cognitive learning objectives. J Med Libr Assoc. 2015; 103: 152-153.
Lunz ME, Bergstrom BA. Equating computerized adaptive certification examinations: the Board of Registry series of studies. Eric.ed.gov. 1995. Disponible en: http://files.eric.ed.gov/fulltext/ED388696.pdf
Zaglaniczny KL. The transition of the national certification examination from paper and pencil to computer adaptive testing. AANA J. 1996; 64: 9-14.
Wise SL, Kingsbury GG. Practical Issues in Developing and Maintaining a Computerized Adaptative Testing Program. Psicológica. 2000; 21: 135-155.
Seo DG. Overview and current management of computerized adaptive testing in licensing/certification examinations. J Educ Eval Health Prof. 2017; 14: 17.
Huh S. Preparing the implementation of computerized adaptive testing for high-stakes examinations. J Educ Eval Health Prof. 2008; 5: 1.
Van Der Vleuten CPM, Schuwirth LWT, Driessen EW, Dijkstra J, Tigelaar D, Baartman LKJ, et al. A model for programmatic assessment fit for purpose. Med Teach. 2012; 34: 205-214.
Stocking ML, Lewis C. Controlling item exposure conditional on ability in computerized adaptive testing. J Educ Behav Stat. 1998; 23: 57-75.
Tonidandel S, ?uiñones MA, Adams AA. Computer-adaptive testing: the impact of test characteristics on perceived performance and test takers' reactions. J Appl Psychol. 2002; 87: 320-332.
Wise SL, Plake BS. Research on the effects of administering tests via computers. Educ Meas Issu Pr. 1989; 8: 5-10.