|
Tabla 3: Comparaciones Games-Howell entre modelos de IA y subgrupos de residentes (R1-R4). |
|||||
|
(I) Nuevo |
(J) Nuevo |
Diferencia de medias (I-J) |
EE |
p |
IC95% |
|
ChatGPT 4 |
Gemini 2.5 |
–5.66667 |
0.86781 |
< 0.001 |
–8.6384 - –2.6949 |
|
Claude 3.7 |
–0.66633 |
0.66667 |
0.963 |
–3.2270 - 1.8943 |
|
|
DeepSeek R1 |
2 |
1.27657 |
0.762 |
–2.4626 - 6.4626 |
|
|
R1 |
35.04897 |
2.62252 |
< 0.001 |
25.7565 - 44.3415 |
|
|
R2 |
34.11467 |
3.95452 |
< 0.001 |
19.1239 - 49.1055 |
|
|
R3 |
32.21444 |
4.34475 |
< 0.001 |
15.2329 - 49.1960 |
|
|
R4 |
23.03333 |
5.85874 |
0.091 |
–3.9323 - 49.9989 |
|
|
Gemini 2.5 |
ChatGPT 4 |
5.66667 |
0.86781 |
< 0.001 |
2.6949 - 8.6384 |
|
Claude 3.7 |
5.00033 |
0.55556 |
< 0.001 |
2.8665 - 7.1342 |
|
|
DeepSeek R1 |
7.66667 |
1.22222 |
< 0.001 |
3.3238 - 12.0095 |
|
|
R1 |
40.71564 |
2.59650 |
< 0.001 |
31.4615 - 49.9698 |
|
|
R2 |
39.78133 |
3.93731 |
< 0.001 |
24.7981 - 54.7645 |
|
|
R3 |
37.88111 |
4.32909 |
< 0.001 |
20.8989 - 54.8633 |
|
|
R4 |
28.70000 |
5.84714 |
0.039 |
1.6972 - 55.7028 |
|
|
Claude 3.7 |
ChatGPT 4 |
0.66633 |
0.66667 |
0.963 |
–1.8943 - 3.2270 |
|
Gemini 2.5 |
–5.00033 |
0.55556 |
< 0.001 |
–7.1342 - –2.8665 |
|
|
DeepSeek R1 |
2.66633 |
1.08866 |
0.321 |
–1.5152 - 6.8478 |
|
|
R1 |
35.71531 |
2.53637 |
< 0.001 |
26.5351 - 44.8955 |
|
|
R2 |
34.78100 |
3.89792 |
< 0.001 |
19.8093 - 49.7527 |
|
|
R3 |
32.88078 |
4.29330 |
< 0.001 |
15.8918 - 49.8698 |
|
|
R4 |
23.69967 |
5.82068 |
0.083 |
–3.3921 - 50.7914 |
|
|
DeepSeek R1 |
ChatGPT 4 |
–2 |
1.27657 |
0.762 |
–6.4626 - 2.4626 |
|
Gemini 2.5 |
–7.66667 |
1.22222 |
< 0.001 |
–12.0095 - –3.3238 |
|
|
Claude 3.7 |
–2.66633 |
1.08866 |
0.321 |
–6.8478 - 1.5152 |
|
|
R1 |
33.04897 |
2.76014 |
< 0.001 |
23.5009 - 42.5971 |
|
|
R2 |
32.11467 |
4.04709 |
< 0.001 |
17.0584 - 47.1709 |
|
|
R3 |
30.21444 |
4.42918 |
0.001 |
13.2151 - 47.2138 |
|
|
R4 |
21.03333 |
5.92162 |
0.124 |
–5.7488 - 47.8154 |
|
|
R1 |
ChatGPT 4 |
–35.04897 |
2.62252 |
< 0.001 |
–44.3415 - –25.7565 |
|
Gemini 2.5 |
–40.71564 |
2.5965 |
< 0.001 |
–49.9698 - –31.4615 |
|
|
Claude 3.7 |
–35.71531 |
2.53637 |
< 0.001 |
–44.8955 - –26.5351 |
|
|
DeepSeek R1 |
–33.04897 |
2.76014 |
< 0.001 |
–42.5971 - –23.5009 |
|
|
R2 |
–0.93431 |
4.65048 |
1.000 |
–17.0252 - 15.1565 |
|
|
R3 |
–2.83453 |
4.98654 |
0.999 |
–20.5372 - 14.8681 |
|
|
R4 |
–12.01564 |
6.34930 |
0.592 |
–38.1539 - 14.1226 |
|
|
R2 |
ChatGPT 4 |
–34.11467 |
3.95452 |
< 0.001 |
–49.1055 - –19.1239 |
|
Gemini 2.5 |
–39.78133 |
3.93731 |
< 0.001 |
–54.7645 - –24.7981 |
|
|
Claude 3.7 |
–34.78100 |
3.89792 |
< 0.001 |
–49.7527 - –19.8093 |
|
|
DeepSeek R1 |
–32.11467 |
4.04709 |
< 0.001 |
–47.1709 - –17.0584 |
|
|
R1 |
0.93431 |
4.65048 |
1.000 |
–15.1565 - 17.0252 |
|
|
R3 |
–1.90022 |
5.79881 |
1.000 |
–21.8803 - 18.0799 |
|
|
R4 |
–11.08133 |
7.00530 |
0.751 |
–37.6942 - 15.5315 |
|
|
R3 |
ChatGPT 4 |
–32.21444 |
4.34475 |
< 0.001 |
–49.1960 - –15.2329 |
|
Gemini 2.5 |
–37.88111 |
4.32909 |
< 0.001 |
–54.8633 - –20.8989 |
|
|
Claude 3.7 |
–32.88078 |
4.29330 |
< 0.001 |
–49.8698 - –15.8918 |
|
|
DeepSeek R1 |
–30.21444 |
4.42918 |
0.001 |
–47.2138 - –13.2151 |
|
|
R1 |
2.83453 |
4.98654 |
0.999 |
–14.8681 - 20.5372 |
|
|
R2 |
1.90022 |
5.79881 |
1.000 |
–21.8803 - 18.0799 |
|
|
R4 |
–9.18111 |
7.23276 |
0.891 |
–36.2745 - 17.9122 |
|
|
R4 |
ChatGPT 4 |
–23.03333 |
5.85874 |
0.091 |
–49.9989 - 3.9323 |
|
Gemini 2.5 |
–28.70000 |
5.84714 |
0.039 |
–55.7028 - –1.6972 |
|
|
Claude 3.7 |
–23.69967 |
5.82068 |
0.083 |
–50.7914 - 3.3921 |
|
|
DeepSeek R1 |
–21.03333 |
5.92162 |
0.124 |
–47.8154 - 5.7488 |
|
|
R1 |
12.01564 |
6.34930 |
0.592 |
–14.1226 - 38.1539 |
|
|
R2 |
11.08133 |
7.00530 |
0.751 |
–15.5315 - 37.6942 |
|
|
R3 |
9.18111 |
7.23276 |
0.891 |
–17.9122 - 36.2745 |
|
|
Diferencias de medias, error estándar (EE), significancia (p) e intervalos de confianza al 95% (IC95%) para comparaciones par a par entre modelos de IA y subgrupos de residentes (R1-R4) en el examen tipo ABIM (American Board of Internal Medicine). Las diferencias estadísticamente significativas (p < 0.05) están marcadas con negritas. |
|||||