Efficacy and safety of balneotherapy in rheumatology: a systematic review and meta-analysis

Methods

The review methods (the review question, the search strategy, the inclusion/exclusion criteria, the tool for ROB assessment, the synthesis plan and the plan for investigating causes of heterogeneity) were established prior to conducting the review. The completed Preferred Reporting Items for Systematic Reviews and Meta-Analyses checklist17 is available in online supplemental material S1. We followed international guidance on conducting evidence synthesis.18

Patient and public involvement

Patients or the public were not involved in the design, or conduct, or reporting, or dissemination plans of our research.

Eligibility criteria

The PICO(S) was:

Patient: Adults only, with a rheumatological indication for balneotherapy. The two previous Cochrane reviews were limited to less than 10 trials each.1 5 Thus, we aimed to assess the intervention in a broader population, to increase the power of the analysis. Moreover, such broad orientation allows matching less precise health insurance terminology (as in France, eg).

Intervention: Any balneotherapy undertaken in Europe, of any duration >10 days, based on any natural mineral water, mud, steam and any adjuvant treatment (including adjuvant physiotherapy). We followed the large definition of balneotherapy, in line with a previous Cochrane review1 and the previous definition of what is balneotherapy.19 Balneotherapy undertaken in Europe was first defined as interventions conducted in countries represented in the European spa association (ESPA);20 if the intervention was conducted in a European country that did not appear among the countries represented in the ESPA, the decision to include the study was made on a case-by-case basis, and after a consensus was reached. We limited our review to Europe as many geographical factors could impact the effect of balneotherapy.19 Therefore, the larger the geographical area considered, the greater the risk of heterogeneity. We believe that the European area provides a balance between sufficient power and not too much heterogeneity. Moreover, the term ‘balneotherapy’ is typically used in European countries.1 Finally, balneotherapy was mostly developed in Europe.2

Control: Any control (standard of care (SOC) without balneotherapy, ‘pseudo-balneotherapy’ <11 days, no treatment, etc), as in the previous Cochrane reviews that compared the balneotherapy with ‘another intervention or with no intervention’.1 5

Outcome: All clinical outcomes that were the primary outcome of the trial, including clinical scales and QoL, validated at least by a national learnt society; trials assessing balneotherapy on non-clinical outcomes were excluded.

Study type: Randomised controlled trials (RCT), assessing superiority or non-inferiority, and of multi or single-centre design. The review was limited to randomised trials to limit the ROB.

The other eligibility criteria were: (1) time frame/years considered: no time restriction; language: English reports only (non-English language would need significant supplementary workforce for low to no impact on treatment estimate).21

Publication status: any.

Information sources

We undertook a comprehensive literature search using the main electronic databases PubMed [including MEDLINE], Embase [Elsevier] and Cochrane Library [Wiley]). These three databases were searched on 24 July 2023. Alerts were set up for all the databases queried and were stopped on 28 November 2023. We also searched for unpublished studies, reports and grey literature in reference lists, previous reviews on the same topic (review register: PROSPERO), trial register (clinicaltrials.gov), congress proceedings (International Society of Medical Hydrology and Climatology, World Federation of Hydrotherapy and Climatotherapy) and asking medical experts.

Search strategy

We used a combination of free-text and thesaurus terms for the concepts relevant to the topic. Searches were limited to documents published in English; no date restrictions were applied. The algorithms were developed with an information specialist (CG) and are available (online supplemental material S2).

Data management

We used the Covidence platform22 for bibliographic records management and data extraction.

Selection process

Two reviewers (IA and GG) conducted the selection process using a standardised template implemented in the Covidence platform at each step (screening on title and abstract then selection on full text). This was done independently, in duplicate. Consensus was searched for in case of disagreement by discussion among the authors.

Data collection process

Two reviewers (IA and GG) extracted the data using a standardised template implemented in the Covidence platform. Data extraction was checked. Consensus was searched for in case of disagreement by discussion among the authors. Study characteristics, population and setting characteristics, and outcome measures were extracted.

Data items

For the treatment effect on continuous outcomes, the point estimate according to the available data was extracted: postintervention mean-value and mean-change versus baseline, with its standard deviation (SD, calculated from the CI if this was reported instead of the SD), at the available time points. If the data were not available in a table but in a figure, we extracted the estimate from the figure. If a trial reported data at 6 months and 9 months but not 12 months after intervention, the 9-month data were used as proxy for 12 months. For the treatment effect on dichotomous outcomes, the number of events and the number of randomised participants in each arm were extracted.

Outcomes and prioritisation

Following the white paper of the French Society of Pharmacology and Therapeutics (Société Française de Pharmacologie et de Thérapeutique), the present meta-analysis should be considered a retrospective study.23 Therefore, the analyses and results are exploratory only. As we expected heterogeneity in the outcomes reported in RCTs, the review focused on outcomes that are both (1) clinically relevant to the patient and (2) expected to be usually available. Moreover, the review included patients suffering from different diseases. Therefore, it was important that the outcomes allowed providing a treatment effect estimate independently of the background physiopathology. In this view, the two primary efficacy outcomes were pain intensity and QoL. We used the pain assessment scale as reported in the RCT (10 points or 100 points). The scale reported in the included trial to assess the QoL was used, and when several QoL measurements were reported, the less disease-specific measures, such as the generic Medical Outcome Study Short Form-36,24 were prioritised in order to limit potential heterogeneity due to the underlying disease. A 3-month follow-up was defined a priori as the primary efficacy outcome for both pain and QoL.

The secondary efficacy outcomes were pain intensity and QoL at 6 months, 12 months and after the intervention (defined as: ‘immediately’ after or ‘shortly’ after (≤1 month) or ‘during’ or ‘at the time’ of the intervention, according to the available data).

The safety outcomes were withdrawal (due to adverse events (AEs) or serious AEs (SAEs) or for any reason according to the available data), SAEs and AEs.

ROB in individual studies

We used the ROB 2 tool25 to assess the ROB of included studies. The ROB of each trial was assessed independently in duplicate by two reviewers (among IA, BK and GG). Consensus was searched for in case of disagreement by discussion among the authors. The ROB was assessed for the primary outcomes that are both continuous variables with potential missing data and both subjective outcomes. The ROB assessment was conducted to assess the effect of assignment to the interventions at baseline (intention to treat analysis).

Data synthesis

The review is limited to aggregated data. For the pain and the QoL outcomes, different scales were available. The measures of pain are pointed in the same direction: the lower the better for the patient (ie, less pain). For the QoL, some scales indicated a better health status by a higher score (the higher the better), while others indicated a better health status with a lower score (the lower the better). We multiplied the postintervention mean value of the lower the better QoL measures by −1 to ensure that all the QoL measures point in the same direction.26 In the meta-analysis, pain outcomes were therefore in the direction that lower is better and QoL outcomes that higher is better. Because of the different scales, standardised mean difference (SMD) was used for pooling the estimates. As the mean change and postintervention mean value should not be combined when using SMD,27 a distinct synthesis for mean change on one hand, and for postintervention mean value on the other hand is provided. However, for exploratory purposes only, we also reported a combined estimate as it has also been reported that combining these measures might not change the results.28

Summary measure

The inverse-variance weighting method was used to provide a pooled estimate of the balneotherapy effect (point estimate and its 95% CI) for each outcome. A random-effects model was used to allow the true population effect size to differ among studies. A restricted maximum likelihood estimator for τ2 was used. To compute the summary effect’s CIs, both the conventional method for random effects and the Hartung-Knapp modification29 were used, and we kept the most conservative method (ie., the method that produced the largest CI).30 I2 with its 95% CI was used to assess the heterogeneity of effect sizes. Statistical analysis was conducted using the R software31 (V.4.3.1), in particular the package meta32 (V.6.5-0). The statistical analysis was conducted in December 2023, that is, after the protocol registration and its amendments.

Additional analyses

A sensitivity analysis and subgroup analyses for the primary outcomes were planned a priori. We amended the protocol as described in the section ‘Registration and protocol’.

Sensitivity analysis

Differences in comparators might increase heterogeneity. To mitigate this risk, we conducted sensitivity analyses to assess the robustness of the results: restricted to placebo-like trials and restricted to the more frequent type of control arm.

Subgroup analyses

Subgroup analyses were conducted to explore potential heterogeneity in the treatment effect: (1) the potential impact of the type of balneotherapy, classified as bath, mud pack, bath plus mud pack and other, adapted from a previous definition19 and (2) the potential impact of the underlying disease, classified by the main indication as mechanic for mechanical disorders, inflammatory for inflammatory and autoimmune diseases, and fibromyalgia for fibromyalgia. We did not use narrower disease categories to maintain the number of trials, and therefore, power, for subgroup analyses.

Reporting bias

Reporting bias was investigated using standard analyses (funnel plot,33 Egger’s34 and Begg’s35 tests) for the two primary outcomes (pain and QoL at 3 months), and for the two safety outcomes that have been assessed using the Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach36 (see below).

Confidence in cumulative evidence

Estimates of the strength of the evidence were provided following the GRADE approach36 for the two primary outcomes (pain and QoL at 3 months) and for two safety outcomes (withdrawal and AE).

ResultsStudy selection

We identified 2395 records from the bibliographic search. After removing duplicates and screening on title and abstract, 104 full-text records were excluded, mostly because of the intervention. We were unable to include three records because of a lack of information regarding their design despite contacting or trying to contact the authors.37–39 In total, 42 studies14–16 40–78 were included in the review (figure 1).

Figure 1Figure 1Figure 1

Flow chart of the systematic review, following PRISMA guidance (extracted from Covidence). PRISMA, Preferred Reporting Items for Systematic Reviews and Meta-Analyses.

Among the included trials, seven were not two-arm parallel design. Among these, (1) four trials were three-arm parallel design, for two of which one arm was not included in the review (a short wave diathermy arm43 and a mud arm that was not a thermal mud),77 and for the other two, two arms were combined in one arm (same balneotherapy but at 36°C for one arm and at 38°C for the second intervention arm,63 and two balneotherapy arms but in two different centres);76 (2) one trial was a four-arm parallel design, of which two arms were excluded because they were neither balneotherapy nor control arms65 and (3) two trials were cross-over trials, of which one trial reported only the first period72 and one trial reported usable data for the first period and non-usable data of the second period.73 All included trials were analysed as a two-arm parallel design.

Study characteristics

The oldest trial was published in 198974 and the most recent in 2023.15 Most of the trials were conducted in Italy (Italy: k=13, Hungary: k=12, France: k=6, Germany: k=2, Spain: k=2, Portugal: k=2, Germany and Austria: k=1, Austria: k=1, Lithuania, Romania and The Netherlands: k=1 each). The main indications for balneotherapy were mechanical disorders (number of trials, k=29), inflammatory diseases (k=9), fibromyalgia (k=4). The intervention type was classified as: bath (k=21), bath plus mud pack (k=13), mud pack (k=3), other (k=5) (classification: see table 1; details: see online supplemental material S3). The control arms were classified as: SOC (k=17), placebo-like (k=13), other (k=9), waiting list (k=3). The duration of follow-up ranged from 1.5 to 52 weeks, the proportion of women from 6.7 to 98.0%, age from 40.6 to 75.5 years, and body mass index ranged from 24.8 to 29.9 kg/m2 (table 1).

Table 1

Characteristics of included trials

Results of individual studies

The point estimate (expressed as an SMD) of individual studies ranged from −2.43 to 0.22 for pain at 3 months (figure 2); and from 2.32 to 0.03 for QoL at 3 months (figure 3).

Figure 2Figure 2Figure 2

Forest plot of the effect of balneotherapy on pain, at 3 months. The data and the synthesis are provided for (1) the subset of trials that reported value as mean change (‘SUBGROUP=meanchange’, synthesis in bold grey), (2) the subset of trials that reported the end value (‘SUBGROUP=endvalue’, synthesis in bold grey), (3) overall (synthesis in bold black). SMD, standardised mean difference.

Figure 3Figure 3Figure 3

Forest plot of the effect of balneotherapy on quality of life, at 3 months. The data and the synthesis are provided for (1) the subset of trials that reported value as mean change (‘SUBGROUP=meanchange’, synthesis in bold grey), (2) the subset of trials that reported the end value (‘SUBGROUP=endvalue’, synthesis in bold grey), (3) overall (synthesis in bold black). SMD, standardised mean difference.

Results of synthesesPrimary outcomesPain at 3 months

Among the 21 trials reporting this outcome (2163 patients), 16 were at high ROB, 5 with some concerns. Regarding pain reported as the mean change from baseline, the intervention was associated with an SMD of −0.31 (95% CI (−0.54; −0.08)), with substantial heterogeneity (I2=54%). Regarding pain reported as postintervention mean value, the intervention was associated with an SMD of −0.89 (95% CI (−1.25; −0.53)), with considerable heterogeneity (I2=85%; figure 2).

Qol at 3 months

Among the 18 trials reporting this outcome (1194 patients), 15 were at high ROB, 3 with some concerns. Regarding QoL reported as mean change from baseline, the intervention was associated with an SMD of 0.33 (95% CI (0.09; 0.56)), with low heterogeneity (I2=12%). Regarding QoL reported as postintervention mean value, the intervention was associated with an SMD of 0.64 (95% CI (0.39; 0.88)), with substantial heterogeneity (I2=62%; figure 3).

Secondary outcomesEfficacy outcomes

Pain at 6 months was reported in 12 trials (2340 patients). The SMD was −0.26 (95% CI (−0.37; −0.16); I2=0%) when reported as mean change from baseline, and −0.52 (95% CI (−0.71; −0.34); I2=17%) when reported as postintervention mean value (online supplemental material S5).

Pain at 12 months was reported in 5 trials (1086 patients). The SMD was −0.11 (95% CI (−0.24; 0.02); I2=0%) when reported as mean change from baseline and −0.23 (95% CI (−0.55; 0.09); I2=0%) when reported as postintervention mean value (online supplemental material S6).

Pain after the intervention was reported in 26 trials (2567 patients). The SMD was −0.14 (95% CI (−0.77; 0.49); I2=91%) when reported as mean change from baseline, and −0.62 (95% CI (−0.84; −0.40); I2=69%) when reported as postintervention mean value (online supplemental material S7).

QoL at 6 months was reported in 11 trials (1659 patients). The SMD was 0.38 (95% CI (0.25; 0.51); I2=0%) when as mean change from baseline, and 0.46 (95% CI (0.24; 0.68); I2=48%) when reported as postintervention mean value (online supplemental material S8).

QoL at 12 months was reported in three trials (314 patients). The SMD was 0.07 (95% CI (−0.18; 0.33); I2=0%) when reported as mean change from baseline, and 0.47 (95% CI (−0.05; 0.98); only one trial) when reported as postintervention mean value (online supplemental material S9).

QoL after the intervention was reported in 21 trials (1631 patients). The SMD was 0.34 (95% CI (0.09; 0.59); I2=54%) when reported as mean change from baseline, and 0.34 (95% CI (−0.06; 0.73); I2=86%) when reported as postintervention mean value (online supplemental material S10).

Safety outcomes

Among 13 trials (2062 patients, 123 events) that reported the risk of withdrawal, 10 were at high ROB and 3 with some concerns. The risk ratio (RR) of withdrawal was inconclusive (0.75, 95% CI (0.46; 1.20); I2=12%, (online supplemental material S11).

The risk of SAE was reported in two trials (406 patients, 29 events). The RR was inconclusive (1.01, 95% CI (0.36; 2.85); I2=21% (online supplemental material S12).

From 5 trials (1123 patients, 87 events) reporting the risk of AE, 2 were at high ROB and 3 with some concerns. The RR was inconclusive (0.80, 95% CI (0.43; 1.50); I2=44% (online supplemental material S13).

Sensitivity analyses of the primary outcomesRestricted to placebo-like design

Regarding pain at 3 months (9 trials, 1264 patients), the SMD was −0.21 (95% CI (−0.41; −0.01); I2=45%) when reported as mean change from baseline, and −0.49 (95% CI (−0.86; −0.11); I2=61%) when reported as postintervention mean value (online supplemental material S14).

Regarding QoL at 3 months (7 trials, 525 patients), the SMD was 0.28 (95% CI (−0.18; 0.73); I2=62%) when reported as mean change from baseline, and 0.47 (95% CI (0.24; 0.69); I2=0%) when reported as postintervention mean value (online supplemental material S15).

Restricted to SOC design

Regarding the pain at 3 months (6 trials, 360 patients), the SMD was −1.50 (95% CI (−2.07; −0.92); I2=82%) when reported as postintervention mean value; no data were available for mean change from baseline (online supplemental material S16).

Regarding QoL at 3 months (6 trials, 350 patients), the SMD was 0.89 (95% CI (0.34; 1.44); I2=78%) when reported as postintervention mean value; no data were usable for mean change from baseline (online supplemental material S17).

Subgroups analyses of the primary outcomesSubgroups for pain at 3 months

Subgroup of interest was reported for 21 trials (2163 patients).

The intervention types bath, mud pack, bath plus mud pack and other were associated with an SMD of −0.59 (95% CI (−0.89; −0.30)), −0.16 (95% CI (−0.92; 0.60)), −1.32 (95% CI (−1.90; −0.73)) and −0.25 (95% CI (−0.47; −0.03)), respectively. The p value of the test for this subgroup effect was <0.01 (online supplemental material S18).

The intervention in mechanical disorder, inflammatory and fibromyalgia indications was associated with an SMD of −0.73 (95% CI (−1.02; −0.43)), −0.25 (95% CI (−0.55; 0.05)) and −1.11 (95% CI (−2.41; 0.18)), respectively. The p value of the test for this subgroup effect was 0.06 (online supplemental material S19).

Subgroups for QoL at 3 months

Subgroup of interest was reported for 18 trials (1194 patients).

The intervention types bath, mud pack, bath plus mud pack and other were associated with an SMD of 0.46 (95% CI (0.27; 0.66)), 0.17 (95% CI (−0.23; 0.56)), 0.88 (95% CI (0.33; 1.43)) and 0.43 (95% CI (0.04; 0.81)), respectively. The p value of the test for this subgroup effect was 0.22 (online supplemental material S20).

The intervention in mechanical disorder, inflammatory and fibromyalgia indications was associated with an SMD of 0.57 (95% CI (0.30; 0.85)), 0.23 (95% CI (−0.05; 0.50)) and 0.83 (95% CI (0.56; 1.10)), respectively. The p value of the test for this subgroup effect was <0.01 (online supplemental material S21).

Certainty of evidence

Regarding the pain at 3 months, the inconsistency, indirectness and imprecision were assessed as ‘very serious’, ‘not serious’ and ‘not serious’, respectively. The overall certainty in the estimate was ‘very low’. Regarding the QoL at 3 months, the inconsistency, indirectness and imprecision were assessed as ‘serious’, ‘not serious’ and ‘not serious’, respectively. The overall certainty in the estimate was ‘very low’. For clarity, we reported the combined estimates of mean change and postintervention mean values for pain and QoL. Regarding the risk of withdrawal, the inconsistency, indirectness and imprecision were assessed as ‘not serious’, ‘not serious’ and ‘serious’, respectively. The overall certainty in the estimate was ‘very low’. Regarding the risk of AE, the inconsistency, indirectness and imprecision were assessed as ‘serious’, ‘not serious’ and ‘very serious’, respectively. The overall certainty in the estimate was ‘very low’ (table 2).

Table 2

Summary of finding including GRADE assessment

Discussion

The results of the present study meet the aim of the study. They indicate that most of the available trials assessing the effect of balneotherapy in rheumatology are at high ROB. Overall, the suggested decrease in pain and the suggested increase in QoL appeared to be of very low level of certainty, that is, the review does not support a benefit of balneotherapy in rheumatology. In addition, the assessment of the safety of balneotherapy was inconclusive, and therefore, there is no reliable evidence of a favourable risk-benefit ratio of balneotherapy in rheumatology.

General interpretation of the results in the context of other evidence

Previous reviews suggested a lack of evidence but were limited to specific indications and included fewer than 10 trials.1 5 8 Our study confirms the lack of evidence for a broader landscape in rheumatology, based on a much bigger sample size (42 included trials). The previous large review, which included 26 studies, did not report pooled estimates of the treatment effect and did not exclusively include randomised trials6 as was the case herein. It is also of note that, although Fernandez-Gonzalez et al 7 concluded to positive effects of balneotherapy, they included only 7 studies and did not report an assessment of the certainty of the evidence, whereas the present review included 42 studies and provided a certainty assessment following GRADE. Finally, one more strength of our review is the pooled estimates and the certainty assessments for safety outcomes also, not provided in the previous reviews.1 5–8 The present review also found the absence of a study at low ROB, which might be related to the specific challenges when assessing such complex interventions.79 The findings also provide evidence for potential publication bias in the field of balneotherapy that could not have been assessed in the two previous Cochrane reviews because of the low number of trials included.1 5 The results of the present study also underscore the difficulty in accurately estimating the treatment effect of complex interventions such as balneotherapy, from the supposed specific effect of thermal water/mud to the non-specific effect of the adjuvant care such as resting and massage. This is supported by the sensitivity analyses that found that the effect estimate was smaller when balneotherapy was compared with placebo-like rather than SOC as the control. This smaller effect with a stronger comparator highlights the potential impact of the adjuvant care associated with balneotherapy and of a potential placebo effect.

Limitations of the evidence included in the review

The included trials were mostly at high ROB. Moreover, there is substantial heterogeneity in outcome measures concerning scale, time points and their reporting, limiting the amount of available data for synthesis (eg, data not available)1 5 14 70 or reported in a way that precluded their use in the synthesis.42 78 Finally, most of the comparators were non-active intervention (SOC or placebo-like comparator), limiting the comparability of balneotherapy to other specific interventions.

Limitations of the review processes

For pooling effect estimates, we used the SMD at the end of follow-up or mean change from baseline and end-intervention between groups. The translation of these changes in SMDs to clinical practice seems difficult. Moreover, we combined different indications (mechanical disorders, inflammatory, fibromyalgia) of balneotherapy in rheumatology. This approach aligns with the prevailing categorisations in the current funding of balneotherapy by the national health insurance in France, which is based on broad medical orientations such as ‘rheumatology’, ‘phlebology’, ‘respiratory tract’, etc. It is noteworthy that this amalgamation introduces heterogeneity, particularly since the underlying indication appears to impact the treatment effect, as evidenced in the subgroup analyses. The review was also limited by the strong publication bias and limited to direct comparisons. Finally, exploring the cost-effectiveness of balneotherapy was beyond the scope of this paper.

Implications of the results for clinical practice, policy and future research

Additional randomised trials with a low ROB are deemed essential to furnish dependable estimates of the impact of balneotherapy in rheumatology. The use of tap water as a control demonstrated the feasibility, although complexity, of conducting randomised, double-blinded, placebo-controlled trials. Furthermore, spin, misleading reporting and inadequate interpretation of the results were common. The registration of the protocols and the statistical analysis plan, guidelines for reporting80 including enhanced reporting of safety outcomes will improve the quality of evidence in this field.

Comments (0)

No login
gif