A scalable mental health intervention for depressive symptoms: evidence from a randomized controlled trial and large-scale real-world studies
Study 1: participants
Given that this study is part of a larger study (see Fig. 1) and only the intervention process of Group 5 was identical to the intervention used in the two real-world studies (i.e., Study 2 & 3), we included two out of the five groups: the wait-list group (Group 1, n = 94), with a mean age of 27.85 years (SD = 5.79), and the “mindfulness practice + writing practice + human support” intervention group (Group 5, n = 94). One participant from the intervention group withdrew from the study due to other therapeutic needs identified after completing the pre-intervention assessments, leaving 93 individuals with a mean age of 27.85 years (SD = 7.47) for the final analysis in the intervention group. There were no significant differences between participants in the two groups in terms of gender, age, pre-intervention depressive symptoms, and pre-intervention well-being (ps > 0.129).

This study only included the wait-list group (Group 1) and the “mindfulness practice + writing practice + human support” group (Group 5).
Among the 187 participants, ages ranged from 18 to 70 years (M = 27.85, SD = 6.67), and 85.03% were female. A total of 151 (80.75%) were successfully followed up and completed the post-intervention assessment. Dropouts and non-dropouts did not significantly differ in terms of gender, age, pre-intervention depression symptoms, and pre-intervention well-being (ps > 0.174).
Study 1: preliminary analyses
Before conducting the primary analyses, we examined the assumptions of homogeneity of variance for ANCOVA using Levene’s test. For depressive symptoms, the homogeneity assumption was satisfied at both pre-intervention, F(1, 185) = 0.15, p = 0.699, and post-intervention assessments, F(1, 185) = 0.43, p = 0.554. Similarly, for well-being, the homogeneity assumption was met at pre-intervention, F(1, 185) = 0.27, p = 0.604, and post-intervention assessments, F(1, 185) = 3.50, p = 0.077.
Study 1: efficacy of the randomized controlled trial
The Group (intervention vs wait-list group) × Time (pre-intervention vs post-intervention assessment) interaction was significant for the depressive symptoms model, F(1, 183) = 6.58, p = 0.013, ηp2 = 0.035. Specifically, both groups showed a decrease in depressive symptoms from pre- to post-intervention; however, the decrease was significantly larger for individuals in the intervention group (ΔMestimate = −3.92), t(183) = −6.06, p < .001, d = −0.75, 95% CI of d [−0.99, −0.51], compared with those in the wait-list group (ΔMestimate = −1.94), t(183) = −2.83, p = 0.007, d = -0.37, 95% CI of d [−0.63, −0.11].
A similar pattern emerged for the well-being model, where the Group × Time interaction was significant, F(1, 183) = 10.96, p = 0.003, ηp2 = 0.056. The intervention group demonstrated significant increases in well-being from pre- to post-intervention (ΔMestimate = 0.45, SE = 0.09), t(183) = 4.80, p < 0.001, d = 0.60, 95% CI of d [0.35, 0.84] whereas the wait-list group did not show any changes (ΔMestimate = 0.08, SE = 0.10), t(183) = 0.83, p = 0.427, d = 0.11, 95% CI of d [−0.15, 0.37]. (see Supplementary Table 3 for details).
As illustrated in Fig. 2, these results indicate that the intervention group experienced greater improvements in both depressive symptoms and well-being compared to the wait-list group, highlighting the efficacy of the intervention. More details of the analyses are presented in Supplementary Section 1.1.

A Changes in depressive symptoms in Study 1; B changes in well-being in Study 1. Boxes display the interquartile range (IQR), whiskers display the 1.5 IQR values of the distribution of depressive symptoms and well-being. Results about mean scores are summarized in Supplementary Table 3.
Study 2: user characteristics
Participants (N = 11,554; age range = 18–81 years, M = 33.24, SD = 6.89; 82.40% female) were first-time users of the intervention between January 6, 2020, and February 19, 2021. Additional demographic and usage details are provided in Supplementary Table 4.
Comparisons in users’ characteristics indicated that females (82.40%) accounted for a higher percentage than males (17.52%), χ²(1) = 4868.34, p < 0.001, Phi(φ) = 0.649. Significant differences emerged in the proportions across age groups, χ²(54) = 17599.98, p < 0.001, Phi(φ) = 1.236. The primary age group was 18−35 years old (69.09%). Only 2.25% of the users were older than 50 years old. There were significant differences in the frequencies of individuals with different severities of depressive symptoms, χ²(4) = 3141.92, p < 0.001, Phi(φ) = 0.521. Most users had mild (33.81%) to moderate (30.19%) depressive symptoms. About a quarter of the users had moderately severe (17.72%) or severe (7.08%) symptoms. The remaining 11.58% of users had minimal depressive symptoms. Supplementary Table 4 and 5 present more details of the user characteristics.
Study 2: user engagement
Completion of the e-mental health intervention required users to finish launch day tasks (Day 0), all components of the 21 days of practice, and both pre- and post-intervention assessments within 25 days of starting. Among all users, 59.80% met this completion criteria, which represents a strong engagement rate for an e-mental health intervention in a real-world setting (vs. typically below 5%)30,35. On average, users completed 17.79 days (SD = 6.72) out of the 22 possible days (80.86%), indicating high adherence. Supplementary Section 2.1 presents more details of the user engagement. Supplementary Table 6 presents completion rates by gender, age groups, and symptom severities; Supplementary Table 7 provided the detail of number of completion of days; Supplementary Table 8 provided completion days by gender, age groups, and symptom severities.
Study 2: effectiveness of the e-mental health intervention
We conducted a series of one-way repeated measures analysis of covariance (re-ANCOVA) to examine the changes in depressive symptoms and well-being from pre- to post-intervention. As shown in Fig. 3A, users’ depressive symptoms showed a significant 34.37% reduction (ΔMestimate = −3.67), t(6,885) = −50.12, p < 0.001, d = −0.82, 95% CI of d [−0.85, −0.78]. Users’ well-being also increased by 14.87% (see Fig. 3C; ΔMestimate = 0.50), t(6,667) = 37.44, p < 0.001, d = 0.64, 95% CI of d [0.60, 0.67]; for more details, see Supplementary Section 2.2 & Supplementary Table 10.

A Changes in depressive symptoms in Study 2; B changes in depressive symptoms in Study 3; C changes in well-being in Study 2; D changes in well-being in Study 3. Boxes display the interquartile ranges and whiskers display the 1.5 IQR values of the distribution of depressive symptoms and well-being. Results about mean scores are summarized in Supplementary Table 10 and 18.
Study 2: heterogeneity of the effectiveness by symptom severity categories
We investigated whether users with different categories of depressive symptom severities experienced varying changes in depressive symptoms and well-being after completing the intervention. The two-way mixed-design ANCOVA indicated heterogeneity in changes of depressive symptoms for users with differential severities of symptoms, F(4, 6,881) = 1037.57, p < 0.001, and ηp2 = 0.38. As shown in Fig. 4A, users who initially had severe pre-intervention depressive symptoms exhibited the most substantial reduction in their depressive symptoms after the intervention (ΔMestimate = −9.40), t(6,881) = −53.70, p < 0.001, d = −2.63, 95% CI of d [−2.73, −2.54], followed by users with moderately severe symptoms (ΔMestimate = −7.25), t(6,881) = −65.27, p < 0.001, d = −2.04, 95% CI of d [−2.10, −1.98], moderate symptoms (ΔMestimate = −4.35), t(6,881) = −49.56, p < 0.001, d = −1.22, 95% CI of d [−1.27, −1.18], and mild symptoms (ΔMestimate = −1.67), t(6,881) = −20.29, p < 0.001, d = 0.47, 95% CI of d [−0.51, −0.42] (for more details, see Supplementary Section 2.3 & Supplementary Table 11).

A Changes in depressive symptoms in Study 2 by severity categories; B changes in depressive symptoms in Study 3 by severity categories; C changes in well-being in Study 2 by severity categories; D changes in well-being in Study 3 by severity categories. Boxes display the interquartile ranges and whiskers display the 1.5 IQR values of the distribution of depressive symptoms and well-being. Results about mean scores are summarized in Supplementary Table 11 and 19.
Similarly, users with higher symptom severity also had larger increases in well-being after the intervention, as indicated by the significant “Severity” × “Time” interaction, F(4, 6,663) = 64.42, p < 0.001, ηp2 = 0.04. As shown in Fig. 4C, users with severe (ΔMestimate = 0.69), t(6,663) = 17.81, p < 0.001, d = 0.89, 95% CI of d [0.79, 0.99] or moderately severe symptoms (ΔMestimate = 0.68), t(6,663) = 27.88, p < 0.001, d = 0.89, 95% CI of d [0.82, 0.95] showed greater increases in well-being, followed by users with moderate symptoms (ΔMestimate = 0.56), t(6,663) = 28.62, p < 0.001, d = 0.72, 95% CI of d [0.67, 0.77] and mild symptoms (ΔMestimate = 0.42), t(6,663) = 22.91, p < 0.001, d = 0.54, 95% CI of d [0.50, 0.59]. Notably, even for individuals with minimal symptoms before the intervention, where a ceiling effect on well-being might be expected, their well-being increased significantly (ΔMestimate = 0.15), t(6,663) = 5.01, p < 0.001, d = 0.20, 95% CI of d [0.12, 0.28] (for more details, see Supplementary Section 2.3 & Supplementary Table 11).
To test the robustness of the effectiveness, we conducted intention-to-treat analysis with a multiple imputation approach, and all the patterns of the results were replicated (see Supplementary Section 2.7).
Study 3: user characteristics
Study 3 aimed to replicate the findings from Study 2 using a larger sample. Study 3 also included naturalistic quasi-comparison groups. Individuals in comparison groups filled the post-intervention measurements but did not complete all components of the intervention. The intervention methods and assessments of depressive symptoms and well-being remained nearly identical, with some changes to several well-being measures (see Supplementary Table 9 & 17 for details).
Participants (N = 44,018; age range = 18–80 years, M = 31.51, SD = 7.50; 83.60% female) were first-time users of the intervention between November 1, 2021, and September 22, 2023. Additional demographic and usage details are provided in Supplementary Table 12.
Females (83.60%) accounted for a higher percentage than males (16.38%), χ²(1) = 19903.70, p < 0.001, Phi(φ) = 0.672. User ages ranged from 18 to 80 years old (M = 31.51 years old, SD = 7.50 years) with significant differences in the proportions across age groups, χ²(6) = 22,688.37, p < 0.001, Phi(φ) = 0.718. The primary age group was 18–35 years old (74.78%). Only 3.02% of the users were older than 50 years old. There were significant differences in frequencies of individuals with different severities of depressive symptoms, χ²(4) = 12,551.41, p < 0.001, Phi(φ) = 0.534. Most users had mild (34.28%) to moderate (30.45%) depressive symptoms. About a quarter of the users had moderately severe (16.95%) or severe (6.64%) symptoms. The remaining 11.68% of users had minimal depressive symptoms. Supplementary Table 12 presents more details of the user characteristics.
Study 3: user engagement
Applying the same completion criteria as in Study 2, 60.53% of users in Study 3 met the requirements, demonstrating similarly high engagement. Users completed an average of 17.81 days (SD = 6.60) out of the possible 22 days (80.95%), indicating high adherence. Supplementary Section 3.1 and Supplementary Tables 14–16 presents more details of the user engagement.
Study 3: effectiveness of the e-mental health intervention
Results from the one-way repeated measures ANCOVA in Study 3 replicated the results in Study 2, indicating that users’ depressive symptoms measured by PHQ-9 reduced 36.00% from pre- to post-intervention (ΔMestimate = −3.74), t(26,636) = −100.50, p < 0.001, d = −0.83, 95% CI of d [−0.85, −0.82]; see Fig. 3B. Additionally, users’ well-being increased 16.31% from pre- to post-intervention (ΔMestimate = 0.53), t(26,631) = 90.07, p < 0.001, d = 0.75, 95% CI of d [−0.73, −0.76]; see Fig. 3D. Supplementary Section 3.2 presents more details of the averaged changes in depressive symptoms and well-being after intervention. (see Supplementary Table 18 for details)
Study 3: heterogeneity of the effectiveness by symptom severity categories
Replicating the results in Study 2, the two-way mixed-design ANCOVA indicated heterogeneity in changes of depressive symptoms for users with differential severities of symptoms, F(4, 26,632) = 3957.21, p < 0.001, ηp2 = 0.37. As shown in Fig. 4B, users who initially had severe pre-intervention depressive symptoms exhibited the most substantial reduction (ΔMestimate = −9.95), t(26,632) = −106.89, p < 0.001, d =−2.80, 95% CI of d [−2.85, −2.75], followed by users with moderately severe symptoms (ΔMestimate = −7.32), t(26,632) = −125.73, p < 0.001, d = −2.06, 95% CI of d [−2.09, −2.03], moderate symptoms (ΔMestimate = −4.52), t(26,632) = -101.63, p < 0.001, d = −1.27, 95% CI of d [−1.30, −1.25], and mild symptoms (ΔMestimate = −1.84), t(26,632) = −44.33, p < 0.001, d = −0.52, 95% CI of d [-0.54, −0.50] (see Supplementary Table 19 for details).
Consistent with the results for depressive symptoms, users with higher symptom severity also had larger increases in well-being after the intervention, as indicated by the significant “Severity” × “Time” interaction, F(4, 26,627) = 138.10, p < 0.001, ηp2 = 0.02. As shown in Fig. 4D, users with severe (ΔMestimate = 0.73), t(26,627) = 40.38, p < 0.001, d = 1.06, 95% CI of d [1.01, 1.11] or moderately severe symptoms (ΔMestimate = 0.62), t(26,627) = 54.27, p < 0.001, d = 0.89, 95% CI of d [0.86, 0.92] had more prominent increases in well-being, followed by users with moderate symptoms (ΔMestimate = 0.58), t(26,627) = 66.01, p < 0.001, d = 0.83, 95% CI of d [0.80, 0.85] and mild symptoms (ΔMestimate = 0.47), t(26,627) = 57.95, p < 0.001, d = 0.68, 95% CI of d [0.65, 0.70]. Notably, even for individuals with minimal symptoms before the intervention, their well-being increased significantly following the intervention (ΔMestimate = 0.33), t(26,627) = 25.77, p < 0.001, d = 0.47, 95% CI of d [0.44, 0.51]. Supplementary Section 3.3 presents more details of the heterogeneous changes in depressive symptoms and well-being after intervention.
The intention-to-treat analysis with multiple imputation approach replicated the per-protocol results of both averaged and heterogeneous effectiveness (see Supplementary Section 3.7 and 3.8), suggesting the robustness of the effectiveness.
Study 3: secondary analyses with naturalistic quasi-comparison groups
In secondary analyses, we examined whether the effectiveness of the intervention varied across users with differential levels of completion. The e-mental health intervention used in the present study could be segmented into three stages: Stage 1 (Days 1–8; “Opening the space of awareness”), Stage 2 (Days 9–14; “Understanding emotions from new perspectives”), and Stage 3 (Days 15–21; “Acting autonomously and effectively”). We then differentiated the non-completers of the intervention into three naturalistic quasi-comparison groups: non-completers of Stage 1 (users who practiced the intervention for less than 8 days), non-completers of Stage 2 (users who practiced the intervention for 8 to 14 days), and non-completers of Stage 3 (users who practiced the intervention for 15–20 days).
The two-way mixed-design ANCOVA between “Completion stages” and “Time” showed that the interaction was significant in the model of depressive symptoms, F(3, 27,733) = 5.37, p < 0.001, ηp2 = 0.001. The intervention had large effect sizes in reducing the depressive symptoms for both completers (ΔMestimate = −3.74, SE = 0.04), t(27,733) = −101.27, p < 0.001, d = -0.83, 95% CI of d [−0.85, −0.81] and non-completers of Stage 3 (ΔMestimate = −3.74), SE = 0.15, t(27,733) = −25.14, p < 0.001, d = −0.83, 95% CI of d [−0.89, −0.77]; however, the effect size of reducing depressive symptoms was smaller for non-completers of Stage 2 (ΔMestimate = −2.79, SE = 0.44), t(27,733) = −6.31, p < 0.001, d = −0.62, 95% CI of d [−0.81, −0.43] and non-completers of Stage 1 (ΔMestimate = −1.62, SE = 0.63), t(27,733) = −2.58, p = 0.010, d = −0.36, 95% CI of d [−0.63, −0.09].
The “Completion stages” by “Time” interaction was also significant in the model of well-being, F(3, 28,007) = 8.12, p < 0.001, ηp2 = 0.001). The intervention had moderate effect sizes in enhancing well-being for both completers (ΔMestimate = 0.53, SE = 0.01), t(28,007) = 91.51, p < 0.001, d = 0.75, 95% CI of d [0.73, 0.76] and non-completers of Stage 3 (ΔMestimate = 0.51, SE = 0.02), t(28,007) = 22.78, p < 0.001, d = 0.72, 95% CI of d [0.66, 0.78]; however, the effect size in enhancing well-being was smaller for non-completers of Stage 2 (ΔMestimate = 0.32, SE = 0.07), t(28,007) = 4.64, p < 0.001, d = 0.45, 95% CI of d [0.26, 0.64] and non-completers of Stage 1 (ΔMestimate = 0.18, SE = 0.09), t(28,007) = 1.89, p = 0.059, d = 0.25, 95% CI of d [−0.01, 0.51].
link
