Use of automated conversational agents in improving young population mental health: a scoping review

Table of Contents

Study selection

The systematic search in databases and external sources returned 9905 articles. After duplicates removal, 6874 articles were screened for title and abstract and further 6719 studies were excluded. Out of the remaining 155 studies, we retrieved full-text copies for 152 articles that were screened in full. This resulted in a total of 25 studies included in the current scoping review. The study selection is detailed in Fig. 1 PRISMA flowchart.

A detailed overview of characteristics of included studies is provided in Supplementary Table 1 and 2.

Of the 25 studies, 19 were recently published (between 2020 and 2023)^{8,9,10,12,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40}. Studies were conducted predominantly in the US (n = 12)^{11,12,27,28,33,34,38,41,42,43,44,45}, followed by Europe (n = 5)^{10,26,29,32,35}, Asia (n = 4)^9,31,36,39, New Zealand (n = 2)^37,40, and Australia (n = 1)³⁰.

Technological characteristics

The summative results for technological characteristics of automated CAs are presented in Table 1. In total, there were 21 different agents described in the included studies. Only 3 of the CA were the focus of more than one study – Paro^11,41,45, Nao^8,38, and Woebot^12,42. These automated CAs were predominantly disembodied chatbots (n = 15)^{10,26,28,29,30,31,32,35,36,37,40,42,43,44}, followed by robots (n = 7)^{8,9,33,34,38,41,45}. Automated CAs with a virtual representation were the focus in 2 studies^11,27. In addition, one application consisted of a chatbot with features of avatar³⁹.

Table 1 Summative results per technological characteristics

Regarding the dialog system underlying the process of conversation, almost half of automated CAs (n = 12) employed natural language processing and machine learning to carry on an interaction^{8,9,12,27,30,31,34,38,39,41,43,45}. Predefined dialog or interactions assembled, and matched to the user input in a dynamic manner was used in 10 studies^{10,11,26,28,29,32,33,35,40,44}, while 3 used a mixed dialog system^36,37,42. These agents communicated through text (n = 13)^{10,12,26,28,29,30,32,35,37,40,42,43,44}, speech (n = 2)^8,38, and non-verbal cues (n = 4)^9,34,41,45, while multiple modalities communication was employed by 5 studies^{11,31,33,36,39}. For one study no information was provided on modality of communication²⁷. Among automated CAs investigated in the included studies, 17 are available to purchase or for free use^{8,9,10,12,27,31,33,34,36,37,38,40,41,42,43,44,45}.

Characteristics of interventions

Characteristics of mental health interventions using automated CAs are detailed in Table 2.

Table 2 Summative results per characteristics of interventions

Anxiety was the most frequent targeted emotional component of mental health by automated CAs (n = 12)^{8,10,12,27,33,35,37,38,41,42,43,45}. Depression was the second most targeted emotional component (n = 8)^{12,28,31,35,36,38,42,43}, followed by psychological well-being (n = 5)^{26,29,30,33,44}, general distress (n = 5)^{9,10,34,39,40}, and mood (n = 2)^33,41. One intervention had as target mental health problems as a broad construct³².

With respect to the scope of interventions, most of the studies labeled the CAs applications as interventions. In fact, those were designed and tested as having mainly a preventive scope, since the research was conducted with general or at-risk population^{8,9,10,11,26,28,29,30,32,33,34,35,37,38,43,44,45}. Only 8 studies were conducted on samples of youths screened as having detectable mental health problems, mainly based on youth or parent report^{12,27,31,36,40,41,42}.

Duration of interventions was reported by 19 studies. Most of the interventions last between 2- and 4-weeks (n = 8)^{8,10,29,38,39,40,42,43,44}, followed by interventions with a duration of 1 day or less (n = 5)^{9,28,34,41,45}, and interventions of 2 up to 7 days (n = 3)^26,31,33. Only 3 studies investigated interventions longer than 4 weeks^12,35,36. In terms of sessions’ frequency only 8 studies provide information and include daily sessions^26,33,43, bi-weekly^10,29,43, once a week³⁹ or 3 times per week³⁵.

Out of 25 included studies, only 5 focused on automated CAs as components embedded in other types of technologies or mental health services for mental health problems^{11,12,27,35,39}. The remaining 20 studies designed or evaluated automated CAs agents as standalone psychological interventions. Automated CAs that were not independent interventions were integrated components of web-based interventions, with additional technological features enabling the intervention such as videoconference or serious games^11,27,35,39 or as an additive component to primary care management¹².

Theoretical framework for automated CAs interventions was reported by 17 studies. Cognitive behavioral theory (CBT) principles were applied to most of the interventions to derive their content. More specifically, CBT was mentioned as a theoretical framework for 14 automated CAs applications^{8,10,11,12,26,28,31,35,36,37,42,43,44}. Among CBT based interventions, 2 applications mentioned relying exclusively on the third wave of CBT principles—acceptance and commitment therapy (ACT)^26,35. The second most reported theoretical framework was positive psychology, with 5 of automated CAs applications mentioning it as guiding theory for the content of the intervention^{29,33,37,40,44}. Other theoretical frameworks were Interpersonal Theory¹², Person Centered Theory³⁹, Metacognitive Intervention of Narrative Imagery³⁸, Motivational Interview⁴³, Transtheoretical Approach⁴³, Emotion Focused Theory⁴³, and Dialectical Behavioral Theory¹². The number of theoretical approaches guiding one intervention ranged from 1 to 4 (median 2.5).

Characteristics of peer-reviewed research

Summative results for characteristics of peer reviewed research are presented in Table 3.

Table 3 Summative results per characteristics of peer reviewed research

Participants were predominantly recruited from an educational setting (n = 10)^{10,11,31,33,35,36,39,40,42,43}, followed by community setting (n = 6)^{26,28,34,37,41,44}, and hospital/healthcare settings (n = 6)^{8,9,12,27,38,45}. Sample sizes ranged between 8 and 234 participants, with 9 studies conducted on samples of less than 50 participants^{12,26,28,29,33,38,39,44,45}, 8 studies on samples between 50 and 100 participants^{9,10,27,34,36,41,42,43}, and 6 studies on samples above 100 participants^{8,11,31,35,37,40}. The presence of emotional problems on a certain level was required by 7 studies^{12,31,36,39,40,41,42}, whereas 4 studies focused on physical health condition as selection criteria^8,38,44,45. Additionally, undergoing a medical procedure, irrespective of health condition, was a selection criterion for 2 studies^9,27. The mean age of participants was 16.64. Females represented 58.14% of the total sample size.

With respect to the stage of research, most studies fall under combinations of research stages: 12 studies on feasibility/usability and evaluation^{10,12,26,27,31,33,36,38,40,42,43,44}, 1 on development and feasibility/usability²⁹, 1 on design and evaluation³⁹, and 1 on design, feasibility/usability, and evaluation³⁷.

Among the 23 feasibility/usability and/or evaluation studies, more than half were controlled studies (n = 14)^{8,9,11,12,27,31,34,35,36,41,42,43,44,45}. Controlled studies predominantly employed an active control group (n = 11)^{8,9,27,30,31,35,36,41,42,43,45}. Among the studies reporting on design and development of automated CAs, 3 used co-participatory and iterative designs, involving the young end users in different stages of development^30,32,37. One study reporting on development relied only on mental health specialists and researchers input in design³⁹. The methodological approaches most frequently employed were mixed (n = 15)^{10,12,26,27,28,29,31,33,36,37,39,40,42,43,44} and quantitative methods (n = 8)^{8,9,11,34,35,38,41,45}.

The feasibility/usability outcomes were reported in 15 studies and include parameters such as engagement, retention/adherence rate, acceptability, user satisfaction, usability of the system, safety, and functionality^{10,12,26,27,31,33,36,38,40,42,43,44}. Overall, the feasibility and usability parameters were reported to be relatively high across studies. However, a few exceptions are worth mentioning. Safety issues were reported in 2 studies^12,26. More than half of the participants reported at least one negative effect of the intervention delivered through SISU chatbot²⁶. A serious adverse event occurred, 1 participant reporting suicidal tendency for the first time after intervention²⁶. One study reported that during study participation, 4 (24%) participants had one alert for suicidal ideation 4 participants had 3, and 2 participants had 6. One parent from the intervention group reported in week 12 that his child was seen in an emergency department and discharged to go home¹². With respect to engagement and adherence, 2 studies point out a decrease of these parameters over time^29,31. The drop-out rates ranged between 0 and 70.9%.

All studies reporting evaluation outcomes included efficacy parameters (n = 21), with no study on cost-effectiveness. In terms of efficacy outcomes, almost half of the studies reported more than one mental health outcome. Summative results for efficacy outcomes per outcome and research design are presented in Table 4.

Table 4 Summative results for efficacy per outcome and study design

Anxiety outcomes were reported in 15 studies. When comparing the effect of automated CAs with a control group on anxiety measures, 5 studies reported a positive significant difference compared to control, favoring the automated CA condition^{12,33,36,43,45}, whereas 4 studies found no significant difference^11,35,41,42. One RCT found an improvement in medical procedure related anxiety only for a subgroup of participants, namely those undergoing more invasive procedures and with more frequent exposure to medical procedures²⁷. Among uncontrolled studies, a significant decrease in anxiety from baseline to post-intervention was reported in 2 studies^36,38, no effect in one study⁴⁰, while one study reported a negative effect of the automated CA mediated intervention expressed as an increase in anxiety symptoms²⁶. One uncontrolled study reported a significant decrease in anxiety only for youths with initial high levels of anxiety¹⁰.

Depression was reported in 9 studies. Among controlled trials focusing on reducing depression, 5 studies reported a significant difference between control and automated CA group, favoring the experimental condition^{12,31,36,42,43}, whereas 2 controlled studies found no significant difference on depression scores^35,44. Among uncontrolled trials, a minimal change in depression score was reported in one study using a robot³⁸, whereas another study showed no improvement from pre to post test²⁶.

Positive and negative affect were separately assessed in 6 studies^{34,36,41,42,43,44}, whereas one study used a composite measure of overall affect, combining both facets in one score³³. All but one study⁴³ reported no significant difference between control group and automated CA condition in reducing negative affect. However, an improvement in positive affect was found in 3 studies^34,41,43, while the other 3 remaining studies reported no difference between groups on this outcome^36,42,44. In one study, a robot coach delivering a positive psychology intervention improved the overall affect among young adults³³.

The effect of automated CAs mediated intervention on distress was explored in 5 studies. Out of the 5 studies, 2 used a controlled design and found a significant effect on distress after 5- and 20-min post-intervention, but not immediately following the intervention^8,9. Among uncontrolled studies, 2 studies report a significant decrease in distress outcomes from pre to post intervention^38,39, while other study found a significant effect on distress only for participants with initial high distress scores¹⁰ Moreover, a negative effect was reported for those with initial low levels of distress, for whom distress increased from pre to post intervention¹⁰.

Two uncontrolled studies were conducted to test the effectiveness of automated CAs mediated intervention on psychological well-being, showing a significant improvement^33,40. One study reported as outcome a measure of psychological sensitivity, which also showed a significant decrease from pre- to post-intervention³⁹. No significant effect of a chatbot based intervention on subjective happiness was reported in the uncontrolled study³⁹. An indicator of anxiety—physiological arousal—was reported in one study, with no change from pre- to post-intervention⁴¹. Similarly, post-traumatic stress disorder symptoms showed no significant improvement after an agent-based software intervention²⁶.

link