2025, Number 4
Limitations and concerns of the MATTERHORN: implications for clinical practice
Language: English
References: 37
Page: 120-126
PDF size: 641.15 Kb.
ABSTRACT
The MATTERHORN trial, named after the iconic Alpine peak, aimed to establish a new therapeutic standard in functional mitral regurgitation (FMR) by comparing surgical mitral valve (MV) intervention with transcatheter edge-to-edge repair (TEER). However, its design introduced key limitations that challenge the real-world relevance of its findings. Sponsored by the MitraClip manufacturer, the trial excluded coronary artery bypass grafting (CABG) from the surgical arm-despite CABG being a guideline-supported therapy in ischemic FMR-thus comparing TEER against a non-standard surgical strategy. A limited 12-month follow-up, reliance on a non-inferiority design with unclear margins, and an endpoint heavily influenced by non-cardiac rehospitalizations further weakened its statistical robustness. Patient selection skewed toward low-risk profiles with mild or atrial-type FMR, suboptimal guideline-directed medical therapy, and significant echocardiographic data gaps. Crucial hemodynamic markers and structural durability outcomes were also omitted. These combined flaws render the trial's claims of therapeutic equipoise questionable. Rather than establishing a new benchmark, MATTERHORN underscores the urgent need for rigorously designed studies capable of providing definitive, guideline-relevant evidence for managing FMR.ABBREVIATIONS:
- ACC/AHA = American College of Cardiology/American Heart Association
- ARNi = angiotensin receptor neprilysin inhibitors
- CABG = coronary artery bypass grafting
- EROA = effective regurgitant orifice area
- ESC/EACTS = European Society of Cardiology/European Association for Cardio-Thoracic Surgery
- FMR = functional mitral regurgitation
- GDMT = guideline-directed medical therapy
- HF = heart failure
- HFrEF = heart failure with reduced ejection fraction
- ITT = intention-to-treat
- LV = left ventricular
- LVAD = left ventricular assist device
- MV = mitral valve
- SGLT2i = sodium-glucose co-transporter 2 inhibitors
- STS-PROM = Society of Thoracic Surgeons Predicted Risk of Mortality
- TEER = transcatheter edge-to-edge repair
Named after one of the most emblematic peaks in the Alps, the MATTERHORN trial sought to attain similar symbolic prominence in the therapeutic landscape of functional mitral regurgitation (FMR). By comparing surgical mitral valve (MV) intervention with transcatheter edge-to-edge repair (TEER), the study attempted to provide clarity on a domain long mired in clinical equipoise. However, rather than offering solid ground, the trial's design and execution introduce critical uncertainties that dilute the applicability of its findings to real-world decision-making.
The trial –formally registered as A Multicenter, Randomized, Controlled Study to Assess Mitral Valve Reconstruction for Advanced Insufficiency of Functional or Ischemic Origin (NCT02371512)– was industry-sponsored by Abbott Vascular, the manufacturer of the MitraClip system. A total of 210 patients with symptomatic FMR despite guideline-directed medical therapy (GDMT), and explicitly without indication for coronary artery bypass grafting (CABG), were randomized to either surgical MV intervention or TEER. The primary efficacy outcome was defined as a composite of death, hospitalization for heart failure (HF), MV reintervention, left ventricular assist device (LVAD) implantation, or stroke at 12 months. The primary safety endpoint comprised major adverse events within 30 days of the procedure.1
Yet, the comparative framework raises substantial methodological and conceptual red flags. First, the decision to exclude CABG –a class I indication in patients with ischemic FMR and multivessel disease– renders the surgical arm artificially limited, diminishing the generalizability of the results.2,3 Moreover, comparing TEER (a therapy granted class IIa status) with standalone MV surgery (typically class IIb in the absence of CABG) does not reflect a comparison between two equally endorsed strategies, but rather places a procedure with limited support under unfair scrutiny.4 This inherent imbalance in therapeutic class recommendations undermines the study's clinical relevance, especially when one considers that CABG remains the only intervention in this setting with robust evidence for improved survival.5,6
Additionally, concerns arise from the trial's sponsorship and design, both of which introduce the potential for bias in endpoint selection, interpretation, and dissemination. Industry funding in cardiovascular trials has historically been associated with more favorable outcomes for the sponsor's product.7 Given the commercial interest in expanding the indications for MitraClip, scrutiny of both trial conduct and conclusions is essential.
Also, the brevity of the follow-up period in the MATTERHORN (12 months) further limits the scope of interpretation. Indeed, the inherent chronicity of HF renders the MATTERHORN trial's 1-year follow-up data demonstrably insufficient for a definitive assessment of TEER efficacy. This limitation is starkly demonstrated by the COAPT trial, where the cumulative incidence of the composite endpoint of all-cause death or HF hospitalization in the TEER-treated cohort rose from 33.9% at 1 year to a striking 73.6% at five years,8 unequivocally underscoring the critical necessity of extended observation periods for accurate evaluation of long-term therapeutic impact.
METHODOLOGICAL AND STATISTICAL ISSUES
A fundamental limitation of the MATTERHORN trial lies in its exclusive reliance on an intention-to-treat (ITT) analysis. While ITT is the gold standard in superiority trials to preserve randomization and avoid attrition bias, its utility in non-inferiority designs is more contentious. In such settings, ITT can obscure meaningful differences between interventions, particularly when crossovers, withdrawals, or protocol deviations occur—factors that are common in interventional cardiology trials.9 A per-protocol or as-treated analysis, ideally presented alongside ITT, would have provided complementary insight and may have altered the interpretation of efficacy.
The reported composite event rates –18.2% in the TEER group and 25% in the surgical group– yielded a non-significant p-value of 0.234. While statistically neutral, this result is prone to misinterpretation in the context of non-inferiority trials, where absence of difference is not synonymous with equivalence.10 Furthermore, the trial should have adopted a one-sided alpha level of 0.025, corresponding to a two-tailed trial, such as the MATTERHORN (but not the 0.05 cited by the authors), consistent with regulatory standards. In turn, this turns out to be highly problematic, given the small sample size and modest event frequency. Low statistical power under such conditions heightens the risk of type II error, potentially overlooking clinically meaningful differences.11
Perhaps the most contentious aspect is the pre-specified assumption of a 35% event rate in the control (surgical) group, used to calculate the non-inferiority margin. This figure appears inflated when contrasted with real-world data. For instance, contemporary analyses of patients with STS-PROM scores exceeding 2% report composite adverse event rates closer to 19.2%, especially in experienced surgical centers.12 Overestimating the control event rate artificially widens the margin for declaring non-inferiority, thereby making it easier for the experimental arm to appear comparable–even when a clinically relevant difference may exist.13 This design flaw is not unique to MATTERHORN; similar criticisms have emerged regarding other device trials seeking regulatory approval based on lenient non-inferiority frameworks.14
Moreover, the use of a composite endpoint –while increasing event counts– risks diluting the relevance of hard outcomes such as mortality or stroke, especially when driven by softer events like rehospitalization. This statistical strategy, though efficient, may not align with clinical priorities and complicates interpretation when the components vary substantially in clinical weight.15
ENDPOINTS AND EVENT INTERPRETATION
The MATTERHORN trial employed a composite primary endpoint for primary efficacy outcome, a composite of death, hospitalization for HF, MV reintervention, LVAD implantation, or stroke at 12 months. While composite endpoints can enhance statistical efficiency by increasing event rates, they often conflate outcomes of disparate clinical relevance, potentially skewing interpretation.16 In MATTERHORN, the most statistically significant component was rehospitalization –specifically, all– cause hospital readmission–rather than cardiovascular-specific events, which dilutes the primary endpoint's capacity to reflect true therapeutic benefit.
This is particularly problematic when the significance of the composite is driven by non-cardiac hospitalizations. In the trial, all-cause rehospitalization favored TEER (24.7%) over surgery (39.0%), yet the observed difference lost statistical significance when stratified into cardiac versus non-cardiac causes. Only the non-cardiac rehospitalizations demonstrated statistical separation between groups, suggesting that this result was incidental rather than causally linked to the MV intervention.17 Including such variables in the composite endpoint undermines its specificity and inflates perceived benefit.18
Further inconsistencies emerged regarding MV reinterventions. At the 30-day follow-up, surgical patients had more reinterventions reported. However, at 1 year, the trend reversed: five TEER patients versus two surgical patients required MV reintervention. This shift not only contradicts earlier data but also raises concerns about the adjudication of such events. The lack of clarity around what constituted a "reintervention" compromises transparency.
Lastly, the trial did not perform a hierarchical testing procedure to control for multiplicity. As such, the isolated significance of a single component (rehospitalization) cannot be interpreted independently without adjustment for multiple comparisons. This is a major shortcoming in trial methodology, as it inflates the chance of type I error and presents a misleading narrative of clinical benefit.19
CLINICAL RELEVANCE OF THE SURGICAL COMPARATOR
A critical flaw in the MATTERHORN trial lies in the choice of surgical comparator. All patients randomized to surgery underwent isolated MV intervention, explicitly excluding CABG, even when coronary disease was present in 43.7% of the series. This is a striking deviation from established guidelines. Both the American College of Cardiology/American Heart Association (ACC/AHA) and the European Society of Cardiology/European Association for Cardio-Thoracic Surgery (ESC/EACTS) confer a class I recommendation for MV surgery only when performed concomitantly with CABG in patients with FMR and suitable coronary anatomy.2,3 When surgery is done in isolation, as in MATTERHORN, the recommendation is downgraded to class IIb, reflecting a weaker evidence base and uncertain clinical benefit.
This context renders the trial's comparison asymmetrical: TEER, a guideline-supported intervention with a class IIa recommendation, was pitted against an off-guideline surgical approach. Consequently, any equivalence or non-inferiority observed is of limited relevance, as it does not reflect standard-of-care surgical management in this population. The trial therefore risks generating conclusions that are misaligned with clinical practice and unsuitable for guideline formulation.
Further compromising the comparator is the unexpected distribution of surgical techniques. Of the surgical group, 28% underwent MV replacement, rather than repair. The trial offers no prespecified criteria to justify this decision. MV repair and replacement are not equivalent procedures–repair is associated with lower thromboembolic risk. Conversely, MV replacement is typically associated to lower reoperation rates for recurrent mitral regurgitation, while survival rate remains without statistical significance at 1-year [hazard ratio: 0.79, 95% CI: 0.42-1.47; p = 0.45) and two-years of follow-up [hazard ratio: 0.79, 95% CI: 0.46-1.35; p = 0.39), respectively.20,21
Standardization of surgical technique is crucial in comparative trials. Yet in MATTERHORN, no centralized surgical protocol, intraoperative echocardiographic standards, or independent surgical adjudication were described. Without these controls, inter-operator variability can significantly influence outcomes, introducing noise and confounding the treatment effect.22,23 For instance, leaflet tethering, annular size, or papillary muscle displacement may have influenced the decision to replace rather than repair, but these anatomical factors were neither quantified nor reported. Therefore, by including a mixed cohort with both procedures –and without stratifying outcomes accordingly– the trial fails to account for this fundamental heterogeneity in surgical risk and prognosis.
PATIENT SELECTION AND POPULATION RISK PROFILE
The MATTERHORN trial sought to investigate whether TEER is non-inferior to MV surgery in patients with FMR who are ostensibly at high surgical risk, according to the authors. However, this premise appears to be fundamentally flawed, as the enrolled population's average Society of Thoracic Surgeons Predicted Risk of Mortality (STS-PROM) score was a mere 2.2%, categorizing most patients as low surgical risk. This is particularly problematic given that current clinical guidelines for FMR do not consider surgical risk as a determining factor in the decision-making paradigm.2,3 Consequently, the trial's design and findings may be rendered incongruous with real-world clinical practice, where treatment decisions are guided by established guidelines criteria rather than surgical risk profiles. This incongruity calls into question the external validity of the trial's findings and undermines their generalizability to real-world FMR populations.
Moreover, over half the patients enrolled were presumed to have atrial-type FMR (Carpentier type I), a variant characterized by mitral annular dilatation in the setting of preserved left ventricular (LV) function and atrial enlargement, often seen in long-standing atrial fibrillation. Atrial-type FMR has a more favorable natural history and better surgical outcomes than ventricular-type (Carpentier type IIIb), which is driven by LV remodeling, leaflet tethering, and poor systolic function.24-27 This distinction is crucial because the latter represents the canonical FMR phenotype for which both surgical and transcatheter interventions are intended. Yet, MATTERHORN fails to stratify outcomes based on FMR subtype, thereby introducing a substantial source of biological heterogeneity into the results. Even within ventricular-type FMR, the study does not report the morphology of leaflet tethering-specifically, whether it was symmetric or asymmetric. This is a critical omission, as symmetric tethering is typically associated with global LV remodeling, whereas asymmetric tethering reflects localized remodeling, often in the posterobasal or inferior wall of the LV. These distinct patterns not only signal different pathophysiological processes but also imply divergent surgical strategies and prognostic expectations.4,28 The lack of such granularity in MATTERHORN reflects an oversimplified echocardiographic assessment of mitral regurgitation mechanism and undermines both the internal and external validity of the trial.
Another critical point is the echocardiographic definition of FMR severity. According to both American and European guidelines, an EROA ≥ 40 mm2 is required to define severe FMR in most cases.29,30 The median effective regurgitant orifice area (EROA) in MATTERHORN was only 20 mm2 (± 10 mm2), with only one third of patients falling below the conventional threshold for severe FMR. Indeed, Wang et al. have emphasized the fact that approximately 60% of the patients in the MATTERHORN had non-severe FMR.31 The inclusion of patients with non-severe FMR, such as moderate (grade 2+) or moderate-to-severe (grade 3+) MR weakens the validity of the intervention and blunts any treatment effect, particularly when analyzing hard outcomes such as survival or rehospitalization.
Taken together, the selection of low-risk patients, inclusion of predominantly atrial-type FMR, inadequate characterization of tethering morphology, and questionable severity thresholds suggest a trial population in whom the potential benefit of any intervention –surgical or transcatheter– would be inherently limited. These design choices bias the study toward a neutral outcome and favor non-inferiority conclusions, which may not hold in patients with truly severe, symptomatic, ventricular-type FMR–the population most in need of guideline-informed therapy.2,3
INCOMPLETE DATA AND INADEQUATE MEDICAL THERAPY
A notable limitation of the MATTERHORN trial is the substantial amount of missing echocardiographic data during follow-up, with over 60% of patients lacking complete MV parameters at one year.32 This high attrition rate in imaging undermines the reliability of longitudinal valve assessment and hampers robust evaluation of procedural durability and functional outcomes. Echocardiographic follow-up is essential to quantify residual mitral regurgitation, ventricular remodeling, and leaflet motion post-intervention, factors directly impacting clinical prognosis.
Moreover, the trial reveals a concerning underutilization of GDMT, a cornerstone in managing patients with FMR and HF with reduced ejection fraction (HFrEF). Only 10.5% of surgical patients were discharged on triple therapy (beta-blocker, renin-angiotensin system inhibitor, and mineralocorticoid receptor antagonist), and there was negligible use of contemporary agents such as sodium-glucose co-transporter 2 inhibitors (SGLT2i), and angiotensin receptor neprilysin inhibitors (ARNi).33 Given the demonstrated survival and morbidity benefits of comprehensive GDMT in HFrEF,34 failure to optimize medical treatment confounds interpretation of procedural efficacy and blunts the generalizability of outcomes. Besides that, Adamo et al.35 demonstrated a direct correlation between GDMT uptitration post-TEER and a composite endpoint of mortality and HF hospitalization at three-year follow-up, with a hazard ratio of 0.54 (95% CI: 0.38-0.76), thereby conferring a significant survival benefit in favor of GDMT uptitration. That means that inadequate medical optimization may have artificially elevated event rates and masked the potential incremental benefit of either surgical or transcatheter intervention. The absence of standardized GDMT protocols and inconsistent implementation between study arms further complicates direct comparison. These deficiencies highlight the imperative for future trials to rigorously enforce GDMT adherence to isolate the true impact of device-based therapies in FMR.
UNREPORTED HEMODYNAMIC OUTCOMES AND STRUCTURAL DURABILITY
A critical shortcoming of the MATTERHORN trial is the omission of key hemodynamic parameters following TEER, specifically residual trans-mitral gradient and MV area measurements. These markers are fundamental to evaluating procedural success, as elevated post-procedural gradients may predispose patients to mitral stenosis and adverse clinical outcomes. It has been demonstrated that up to 26.4% of cases post-TEER exhibits a mean trans-mitral gradient > 5 mmHg. In turn, a significant independent association has been observed between elevated mean trans-mitral gradient ≥ 5 mmHg and increased risk of all-cause mortality, with a hazard ratio of 1.38 (95% CI: 1.08-1.76, p = 0.009). The absence of these data limits the ability to discern the functional quality of the valve repair, a vital consideration given the known trade-off between mitral regurgitation reduction and iatrogenic mitral stenosis inherent to the TEER technique.36
In addition, the trial did not systematically assess or report structural MV repair failure rates, an endpoint increasingly recognized as more objective and clinically meaningful than reoperation rates alone. Structural durability reflects the intrinsic longevity and performance of the repair device or surgical intervention and has significant implications for patient prognosis and subsequent management. Reoperation rate after MV procedures is potentially confounded by numerous factors, including medical indications, patient choices, and clinician decisions. Therefore, quantifying failure rate based on the presence of 3+ or 4+ mitral regurgitation may offer a more robust and reliable measure of procedural efficacy.37 Without such data, the long-term efficacy and safety profile of TEER versus surgical MV repair remain inadequately characterized.
Finally, a striking disparity exists between the MATTERHORN trial's officially registry data (NCT02371512) and published results.1 Despite initiation in 2015 with an anticipated completion by 2019, the registry has remained dormant since 2017, contravening established protocols. The assertion of patient enrollment until 20221 starkly contrasts with the registry's inactivity in the official website (NCT02371512), casting doubt on the trial's conduct. Moreover, the brevity of the follow-up period, limited to only one-year, appears anomalous given the nine-year interval between enrollment commencement and publication date in 2024, underscoring the need for more protracted follow-up, at least five years or even longer, to yield meaningful conclusions.
CONCLUSION
The MATTERHORN trial's lofty ambitions to redefine the treatment paradigm for FMR are starkly at odds with its findings, which are irreparably compromised by egregious methodological flaws. Despite its namesake's majestic peak, the trial's results are decidedly underwhelming, offering little more than a faint glimpse of potential equipoise between TEER and MV surgery in a narrowly defined, low-risk cohort. The study's glaring shortcomings, including a woefully inadequate comparator arm and haphazard endpoint reporting, render its conclusions tenuous at best. In light of these profound limitations, the trial's findings must be viewed with extreme skepticism. To truly illuminate the optimal treatment strategies for FMR, future studies must prioritize rigorous methodological design, standardized surgical techniques, and comprehensive endpoint definitions that capture the full spectrum of clinical, hemodynamic, and structural outcomes. Only then can the field move beyond the MATTERHORN trial's meager contributions and toward a more nuanced understanding of TEER and surgery in managing FMR.
REFERENCES
Zoghbi WA, Adams D, Bonow RO, et al. Recommendations for noninvasive evaluation of native valvular regurgitation: a report from the American Society of Echocardiography developed in collaboration with the Society for Cardiovascular Magnetic Resonance. J Am Soc Echocardiogr. 2017;30(4):303-371. doi: 10.1016/j.echo.2017.01.007.
Lancellotti P, Tribouilloy C, Hagendorff A, et al; Scientific Document Committee of the European Association of Cardiovascular Imaging. Recommendations for the echocardiographic assessment of native valvular regurgitation: an executive summary from the European Association of Cardiovascular Imaging. Eur Heart J Cardiovasc Imaging. 2013;14(7):611-644. doi: 10.1093/ehjci/jet105.
Heidenreich PA, Bozkurt B, Aguilar D, et al; ACC/AHA Joint Committee Members. 2022 AHA/ACC/HFSA Guideline for the Management of Heart Failure: A Report of the American College of Cardiology/American Heart Association Joint Committee on Clinical Practice Guidelines. Circulation. 2022;145(18):e895-e1032. doi: 10.1161/CIR.0000000000001063.
AFFILIATIONS
1Mexican College of Cardiovascular and Thoracic Surgery. Mexico City. Mexico.
Funding: none.
Disclosures: the author has no conflict of interest to disclose.
CORRESPONDENCE
Dr. Ovidio A. García Villarreal. E-mail: ovidiocardiotor@gmail.comReceived: 07-16-2025. Accepted: 07-17-2025.