Use of automated conversational agents in improving young population mental health: a scoping review
Study selection
The systematic search in databases and external sources returned 9905 articles. After duplicates removal, 6874 articles were screened for title and abstract and further 6719 studies were excluded. Out of the remaining 155 studies, we retrieved full-text copies for 152 articles that were screened in full. This resulted in a total of 25 studies included in the current scoping review. The study selection is detailed in Fig. 1 PRISMA flowchart.
A detailed overview of characteristics of included studies is provided in Supplementary Table 1 and 2.
Of the 25 studies, 19 were recently published (between 2020 and 2023)8,9,10,12,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40. Studies were conducted predominantly in the US (n = 12)11,12,27,28,33,34,38,41,42,43,44,45, followed by Europe (n = 5)10,26,29,32,35, Asia (n = 4)9,31,36,39, New Zealand (n = 2)37,40, and Australia (n = 1)30.
Technological characteristics
The summative results for technological characteristics of automated CAs are presented in Table 1. In total, there were 21 different agents described in the included studies. Only 3 of the CA were the focus of more than one study – Paro11,41,45, Nao8,38, and Woebot12,42. These automated CAs were predominantly disembodied chatbots (n = 15)10,26,28,29,30,31,32,35,36,37,40,42,43,44, followed by robots (n = 7)8,9,33,34,38,41,45. Automated CAs with a virtual representation were the focus in 2 studies11,27. In addition, one application consisted of a chatbot with features of avatar39.
Regarding the dialog system underlying the process of conversation, almost half of automated CAs (n = 12) employed natural language processing and machine learning to carry on an interaction8,9,12,27,30,31,34,38,39,41,43,45. Predefined dialog or interactions assembled, and matched to the user input in a dynamic manner was used in 10 studies10,11,26,28,29,32,33,35,40,44, while 3 used a mixed dialog system36,37,42. These agents communicated through text (n = 13)10,12,26,28,29,30,32,35,37,40,42,43,44, speech (n = 2)8,38, and non-verbal cues (n = 4)9,34,41,45, while multiple modalities communication was employed by 5 studies11,31,33,36,39. For one study no information was provided on modality of communication27. Among automated CAs investigated in the included studies, 17 are available to purchase or for free use8,9,10,12,27,31,33,34,36,37,38,40,41,42,43,44,45.
Characteristics of interventions
Characteristics of mental health interventions using automated CAs are detailed in Table 2.
Anxiety was the most frequent targeted emotional component of mental health by automated CAs (n = 12)8,10,12,27,33,35,37,38,41,42,43,45. Depression was the second most targeted emotional component (n = 8)12,28,31,35,36,38,42,43, followed by psychological well-being (n = 5)26,29,30,33,44, general distress (n = 5)9,10,34,39,40, and mood (n = 2)33,41. One intervention had as target mental health problems as a broad construct32.
With respect to the scope of interventions, most of the studies labeled the CAs applications as interventions. In fact, those were designed and tested as having mainly a preventive scope, since the research was conducted with general or at-risk population8,9,10,11,26,28,29,30,32,33,34,35,37,38,43,44,45. Only 8 studies were conducted on samples of youths screened as having detectable mental health problems, mainly based on youth or parent report12,27,31,36,40,41,42.
Duration of interventions was reported by 19 studies. Most of the interventions last between 2- and 4-weeks (n = 8)8,10,29,38,39,40,42,43,44, followed by interventions with a duration of 1 day or less (n = 5)9,28,34,41,45, and interventions of 2 up to 7 days (n = 3)26,31,33. Only 3 studies investigated interventions longer than 4 weeks12,35,36. In terms of sessions’ frequency only 8 studies provide information and include daily sessions26,33,43, bi-weekly10,29,43, once a week39 or 3 times per week35.
Out of 25 included studies, only 5 focused on automated CAs as components embedded in other types of technologies or mental health services for mental health problems11,12,27,35,39. The remaining 20 studies designed or evaluated automated CAs agents as standalone psychological interventions. Automated CAs that were not independent interventions were integrated components of web-based interventions, with additional technological features enabling the intervention such as videoconference or serious games11,27,35,39 or as an additive component to primary care management12.
Theoretical framework for automated CAs interventions was reported by 17 studies. Cognitive behavioral theory (CBT) principles were applied to most of the interventions to derive their content. More specifically, CBT was mentioned as a theoretical framework for 14 automated CAs applications8,10,11,12,26,28,31,35,36,37,42,43,44. Among CBT based interventions, 2 applications mentioned relying exclusively on the third wave of CBT principles—acceptance and commitment therapy (ACT)26,35. The second most reported theoretical framework was positive psychology, with 5 of automated CAs applications mentioning it as guiding theory for the content of the intervention29,33,37,40,44. Other theoretical frameworks were Interpersonal Theory12, Person Centered Theory39, Metacognitive Intervention of Narrative Imagery38, Motivational Interview43, Transtheoretical Approach43, Emotion Focused Theory43, and Dialectical Behavioral Theory12. The number of theoretical approaches guiding one intervention ranged from 1 to 4 (median 2.5).
Characteristics of peer-reviewed research
Summative results for characteristics of peer reviewed research are presented in Table 3.
Participants were predominantly recruited from an educational setting (n = 10)10,11,31,33,35,36,39,40,42,43, followed by community setting (n = 6)26,28,34,37,41,44, and hospital/healthcare settings (n = 6)8,9,12,27,38,45. Sample sizes ranged between 8 and 234 participants, with 9 studies conducted on samples of less than 50 participants12,26,28,29,33,38,39,44,45, 8 studies on samples between 50 and 100 participants9,10,27,34,36,41,42,43, and 6 studies on samples above 100 participants8,11,31,35,37,40. The presence of emotional problems on a certain level was required by 7 studies12,31,36,39,40,41,42, whereas 4 studies focused on physical health condition as selection criteria8,38,44,45. Additionally, undergoing a medical procedure, irrespective of health condition, was a selection criterion for 2 studies9,27. The mean age of participants was 16.64. Females represented 58.14% of the total sample size.
With respect to the stage of research, most studies fall under combinations of research stages: 12 studies on feasibility/usability and evaluation10,12,26,27,31,33,36,38,40,42,43,44, 1 on development and feasibility/usability29, 1 on design and evaluation39, and 1 on design, feasibility/usability, and evaluation37.
Among the 23 feasibility/usability and/or evaluation studies, more than half were controlled studies (n = 14)8,9,11,12,27,31,34,35,36,41,42,43,44,45. Controlled studies predominantly employed an active control group (n = 11)8,9,27,30,31,35,36,41,42,43,45. Among the studies reporting on design and development of automated CAs, 3 used co-participatory and iterative designs, involving the young end users in different stages of development30,32,37. One study reporting on development relied only on mental health specialists and researchers input in design39. The methodological approaches most frequently employed were mixed (n = 15)10,12,26,27,28,29,31,33,36,37,39,40,42,43,44 and quantitative methods (n = 8)8,9,11,34,35,38,41,45.
The feasibility/usability outcomes were reported in 15 studies and include parameters such as engagement, retention/adherence rate, acceptability, user satisfaction, usability of the system, safety, and functionality10,12,26,27,31,33,36,38,40,42,43,44. Overall, the feasibility and usability parameters were reported to be relatively high across studies. However, a few exceptions are worth mentioning. Safety issues were reported in 2 studies12,26. More than half of the participants reported at least one negative effect of the intervention delivered through SISU chatbot26. A serious adverse event occurred, 1 participant reporting suicidal tendency for the first time after intervention26. One study reported that during study participation, 4 (24%) participants had one alert for suicidal ideation 4 participants had 3, and 2 participants had 6. One parent from the intervention group reported in week 12 that his child was seen in an emergency department and discharged to go home12. With respect to engagement and adherence, 2 studies point out a decrease of these parameters over time29,31. The drop-out rates ranged between 0 and 70.9%.
All studies reporting evaluation outcomes included efficacy parameters (n = 21), with no study on cost-effectiveness. In terms of efficacy outcomes, almost half of the studies reported more than one mental health outcome. Summative results for efficacy outcomes per outcome and research design are presented in Table 4.
Anxiety outcomes were reported in 15 studies. When comparing the effect of automated CAs with a control group on anxiety measures, 5 studies reported a positive significant difference compared to control, favoring the automated CA condition12,33,36,43,45, whereas 4 studies found no significant difference11,35,41,42. One RCT found an improvement in medical procedure related anxiety only for a subgroup of participants, namely those undergoing more invasive procedures and with more frequent exposure to medical procedures27. Among uncontrolled studies, a significant decrease in anxiety from baseline to post-intervention was reported in 2 studies36,38, no effect in one study40, while one study reported a negative effect of the automated CA mediated intervention expressed as an increase in anxiety symptoms26. One uncontrolled study reported a significant decrease in anxiety only for youths with initial high levels of anxiety10.
Depression was reported in 9 studies. Among controlled trials focusing on reducing depression, 5 studies reported a significant difference between control and automated CA group, favoring the experimental condition12,31,36,42,43, whereas 2 controlled studies found no significant difference on depression scores35,44. Among uncontrolled trials, a minimal change in depression score was reported in one study using a robot38, whereas another study showed no improvement from pre to post test26.
Positive and negative affect were separately assessed in 6 studies34,36,41,42,43,44, whereas one study used a composite measure of overall affect, combining both facets in one score33. All but one study43 reported no significant difference between control group and automated CA condition in reducing negative affect. However, an improvement in positive affect was found in 3 studies34,41,43, while the other 3 remaining studies reported no difference between groups on this outcome36,42,44. In one study, a robot coach delivering a positive psychology intervention improved the overall affect among young adults33.
The effect of automated CAs mediated intervention on distress was explored in 5 studies. Out of the 5 studies, 2 used a controlled design and found a significant effect on distress after 5- and 20-min post-intervention, but not immediately following the intervention8,9. Among uncontrolled studies, 2 studies report a significant decrease in distress outcomes from pre to post intervention38,39, while other study found a significant effect on distress only for participants with initial high distress scores10 Moreover, a negative effect was reported for those with initial low levels of distress, for whom distress increased from pre to post intervention10.
Two uncontrolled studies were conducted to test the effectiveness of automated CAs mediated intervention on psychological well-being, showing a significant improvement33,40. One study reported as outcome a measure of psychological sensitivity, which also showed a significant decrease from pre- to post-intervention39. No significant effect of a chatbot based intervention on subjective happiness was reported in the uncontrolled study39. An indicator of anxiety—physiological arousal—was reported in one study, with no change from pre- to post-intervention41. Similarly, post-traumatic stress disorder symptoms showed no significant improvement after an agent-based software intervention26.
link