Digital voice assistants (DVAs) are gaining increasing popularity as a tool for accessing online mental health information. However, the quality of information provided by DVAs is not known. This study seeks to evaluate the quality of DVA responses to mental health-related queries in relation to six quality domains: comprehension ability, relevance, comprehensiveness, accuracy, understandability and reliability.
Four smartphone DVAs were evaluated: Apple Siri, Samsung Bixby, Google Assistant and Amazon Alexa. Sixty-six questions and answers on mental health conditions (depression, anxiety, obsessive-compulsive disorder (OCD) and bipolar disorder) were compiled from authoritative sources, clinical guidelines and public search trends. Three evaluators scored the DVAs from an in-house-developed evaluation rubric. Data were analyzed by using the Kruskal-Wallis and Wilcoxon rank sum tests.
Across all questions, Google Assistant scored the highest (78.9%), while Alexa scored the lowest (64.5%). Siri (83.9%), Bixby (87.7%) and Google Assistant (87.4%) scored the best for questions on depression, while Alexa (72.3%) scored the best for OCD questions. Bixby scored the lowest for questions on general mental health (0%) and OCD (0%) compared to all other DVAs. In terms of the quality domains, Google Assistant scored significantly higher for comprehension ability compared to Siri (100% versus 88.9%, p < 0.001) and Bixby (100% versus 94.5%, p < 0.001). Moreover, Google Assistant also scored significantly higher than Siri (100% versus 66.7%, p < 0.001) and Alexa (100% versus 75.0%, p < 0.001) in terms of relevance. In contrast, Alexa scored the worst in terms of accuracy (75.0%), reliability (58.3%) and comprehensiveness (22.2%) compared to all other DVAs.
Overall, Google Assistant performed the best in terms of responding to the mental health-related queries, while Alexa performed the worst. While the comprehension abilities of the DVAs were good, the DVAs had differing performances in the other quality domains. The responses by DVAs should be supplemented with other information from authoritative sources, and users should seek the help and advice of a healthcare professional when managing their mental health conditions.
Citation: Vanessa Kai Lin Chua, Li Lian Wong, Kevin Yi-Lwern Yap. Quality evaluation of digital voice assistants for the management of mental health conditions[J]. AIMS Medical Science, 2022, 9(4): 512-530. doi: 10.3934/medsci.2022028
Digital voice assistants (DVAs) are gaining increasing popularity as a tool for accessing online mental health information. However, the quality of information provided by DVAs is not known. This study seeks to evaluate the quality of DVA responses to mental health-related queries in relation to six quality domains: comprehension ability, relevance, comprehensiveness, accuracy, understandability and reliability.
Four smartphone DVAs were evaluated: Apple Siri, Samsung Bixby, Google Assistant and Amazon Alexa. Sixty-six questions and answers on mental health conditions (depression, anxiety, obsessive-compulsive disorder (OCD) and bipolar disorder) were compiled from authoritative sources, clinical guidelines and public search trends. Three evaluators scored the DVAs from an in-house-developed evaluation rubric. Data were analyzed by using the Kruskal-Wallis and Wilcoxon rank sum tests.
Across all questions, Google Assistant scored the highest (78.9%), while Alexa scored the lowest (64.5%). Siri (83.9%), Bixby (87.7%) and Google Assistant (87.4%) scored the best for questions on depression, while Alexa (72.3%) scored the best for OCD questions. Bixby scored the lowest for questions on general mental health (0%) and OCD (0%) compared to all other DVAs. In terms of the quality domains, Google Assistant scored significantly higher for comprehension ability compared to Siri (100% versus 88.9%, p < 0.001) and Bixby (100% versus 94.5%, p < 0.001). Moreover, Google Assistant also scored significantly higher than Siri (100% versus 66.7%, p < 0.001) and Alexa (100% versus 75.0%, p < 0.001) in terms of relevance. In contrast, Alexa scored the worst in terms of accuracy (75.0%), reliability (58.3%) and comprehensiveness (22.2%) compared to all other DVAs.
Overall, Google Assistant performed the best in terms of responding to the mental health-related queries, while Alexa performed the worst. While the comprehension abilities of the DVAs were good, the DVAs had differing performances in the other quality domains. The responses by DVAs should be supplemented with other information from authoritative sources, and users should seek the help and advice of a healthcare professional when managing their mental health conditions.
[1] | (2017) World Health OrganizationDepression and other common mental disorders: Global health estimates. Geneva, Switzerland: WHO Document Production Services 24 pp. Available from: https://apps.who.int/iris/bitstream/handle/10665/254610/WHO-MSD-MER-2017.2-eng.pdf. |
[2] | World Health Organization. Mental health and COVID-19. Available from: https://www.who.int/teams/mental-health-and-substance-use/mental-health-and-covid-19. |
[3] | Centers for Disease Control and PreventionMental health: Coping with stress. U.S. Department of Health & Human Services (2022). Available from: https://www.cdc.gov/coronavirus/2019-ncov/daily-life-coping/managing-stress-anxiety.html. |
[4] | Borenstein J Stigma, prejudice and discrimination against people with mental illness. American Psychiatric Association (2020). Available from: https://www.psychiatry.org/patients-families/stigma-and-discrimination. |
[5] | DeAndrea DC (2015) Testing the proclaimed affordances of online support groups in a nationally representative sample of adults seeking mental health assistance. J Health Commun 20: 147-156. https://doi.org/10.1080/10810730.2014.914606 |
[6] | Gulliver A, Griffiths KM, Christensen H (2010) Perceived barriers and facilitators to mental health help-seeking in young people: A systematic review. BMC Psychiatry 10: 113. https://doi.org/10.1186/1471-244X-10-113 |
[7] | Rickwood DJ, Deane FP, Wilson CJ (2007) When and how do young people seek professional help for mental health problems?. Med J Aust 187: S35-S39. https://doi.org/10.5694/j.1326-5377.2007.tb01334.x |
[8] | Well Being Trust.Digital health practices, social media use, and mental well-being among teens and young adults in the US. Well Being Trust (2018) . Available from: https://wellbeingtrust.org/bewell/digital-health-practices-social-media-use-and-mental-well-being-among-teens-and-young-adults-in-the-u-s/. |
[9] | Jee C (2019) Amazon Alexa will now be giving out health advice to UK citizens. MIT Technology Review . Available from: https://www.technologyreview.com/2019/07/10/134244/amazon-alexa-will-now-be-giving-out-health-advice-to-uk-citizens/. |
[10] | Yang S, Lee J, Sezgin E, et al. (2021) Clinical advice by voice assistants on postpartum depression: Cross-sectional investigation using Apple Siri, Amazon Alexa, Google Assistant, and Microsoft Cortana. JMIR Mhealth Uhealth 9: e24045. https://doi.org/10.2196/24045 |
[11] | Alagha EC, Helbing RR (2019) Evaluating the quality of voice assistants' responses to consumer health questions about vaccines: An exploratory comparison of Alexa, Google Assistant and Siri. BMJ Health Care Inform 26: e100075. https://doi.org/10.1136/bmjhci-2019-100075 |
[12] | Figueiredo CMS, de Melo T, Goes R (2022) Evaluating voice assistants' responses to COVID-19 vaccination in Portuguese: Quality assessment. JMIR Hum Factors 9: e34674. https://doi.org/10.2196/34674 |
[13] | Hong G, Folcarelli A, Less J, et al. (2021) Voice assistants and cancer screening: A comparison of Alexa, Siri, Google Assistant, and Cortana. Ann Fam Med 19: 447-449. https://doi.org/10.1370/afm.2713 |
[14] | Boyd M, Wilson N (2018) Just ask Siri? A pilot study comparing smartphone digital assistants and laptop Google searches for smoking cessation advice. PLoS One 13: e0194811. https://doi.org/10.1371/journal.pone.0194811 |
[15] | Kinsella B (2019) Voice assistant demographic data—Young consumers more likely to own smart speakers while over 60 bias toward Alexa and Siri. Voicebot.ai . Available from: https://voicebot.ai/2019/06/21/voice-assistant-demographic-data-young-consumers-more-likely-to-own-smart-speakers-while-over-60-bias-toward-alexa-and-siri/. |
[16] | Think with GoogleMarketing strategies. Global Web Index, Voice Search Insight Report, 2018. Available from: https://www.thinkwithgoogle.com/marketing-strategies/search/voice-search-mobile-use-statistics/. |
[17] | Miner AS, Milstein A, Schueller S, et al. (2016) Smartphone-based conversational agents and responses to questions about mental health, interpersonal violence, and physical health. JAMA Intern Med 176: 619-625. https://doi.org/10.1001/jamainternmed.2016.0400 |
[18] | Sezgin E, Militello LK, Huang Y, et al. (2020) A scoping review of patient-facing, behavioral health interventions with voice assistant technology targeting self-management and healthy lifestyle behaviors. Transl Behav Med 10: 606-628. https://doi.org/10.1093/tbm/ibz141 |
[19] | Jorm AF, Korten AE, Jacomb PA, et al. (1997) “Mental health literacy”: A survey of the public's ability to recognise mental disorders and their beliefs about the effectiveness of treatment. Med J Aust 166: 182-186. https://doi.org/10.5694/j.1326-5377.1997.tb140071.x |
[20] | Zachrisson HD, Rodje K, Mykletun A (2006) Utilization of health services in relation to mental health problems in adolescents: A population based survey. BMC Public Health 6: 34. https://doi.org/10.1186/1471-2458-6-34 |
[21] | Health On The Net (HON) Foundation.HONcode health sites certification guidelines. Health On The Net (2020) . Available from: https://web.archive.org/web/20220119005932/https://www.hon.ch/imgs/2020/EN-Guidelines-Sites_compressed.pdf. |
[22] | Charnock D (1998) The DISCERN Handbook: Quality criteria for consumer health information on treatment choices. Abingdon, Oxon: Radcliffe Medical Press 55 pp. Available from: https://web.archive.org/web/20220621053038/http://www.discern.org.uk/discern.pdf. |
[23] | Robillard JM, Jun JH, Lai JA, et al. (2018) The QUEST for quality online health information: Validation of a short quantitative tool. BMC Med Inform Decis Mak 18: 87. https://doi.org/10.1186/s12911-018-0668-9 |
[24] | Ren FF, Guo RJ (2020) Public mental health in post-COVID-19 era. Psychiatr Danub 32: 251-255. https://doi.org/10.24869/psyd.2020.251 |
[25] | Nemoto K, Tachikawa H, Sodeyama N, et al. (2007) Quality of Internet information referring to mental health and mental disorders in Japan. Psychiatry Clin Neurosci 61: 243-248. https://doi.org/10.1111/j.1440-1819.2007.01650.x |
[26] | Inkster B, Sarda S, Subramanian V (2018) An empathy-driven, conversational artificial intelligence agent (Wysa) for digital mental well-being: Real-world data evaluation mixed-methods study. JMIR Mhealth Uhealth 6: e12106. https://doi.org/10.2196/12106 |
[27] | Kaminer Y, Litt MD, Burke RH, et al. (2006) An interactive voice response (IVR) system for adolescents with alcohol use disorders: A pilot study. Am J Addict 15: 122-125. https://doi.org/10.1080/10550490601006121 |
[28] | Rose GL, Skelly JM, Badger GJ, et al. (2012) Interactive voice response for relapse prevention following cognitive-behavioral therapy for alcohol use disorders: A pilot study. Psychol Serv 9: 174-184. https://doi.org/10.1037/a0027606 |
[29] | Yap KY, Raaj S, Chan A (2010) OncoRx-IQ: A tool for quality assessment of online anticancer drug interactions. Int J Qual Health Care 22: 93-106. https://doi.org/10.1093/intqhc/mzq004 |
[30] | Farkas C, Solodiuk L, Taddio A, et al. (2015) Publicly available online educational videos regarding pediatric needle pain: A scoping review. Clin J Pain 31: 591-598. https://doi.org/10.1097/AJP.0000000000000197 |
[31] | Garcia M, Daugherty C, Khallouq BB, et al. (2018) Critical assessment of pediatric neurosurgery patient/parent educational information obtained via the Internet. J Neurosurg Pediatr 21: 535-541. https://doi.org/10.3171/2017.10.PEDS17177 |
[32] | Moult B, Franck LS, Brady H (2004) Ensuring quality information for patients: Development and preliminary validation of a new instrument to improve the quality of written health care information. Health Expect 7: 165-175. https://doi.org/10.1111/j.1369-7625.2004.00273.x |
[33] | MinervationThe LIDA Instrument: Minervation validation instrument for health care web sites—Full version (1.2) containing instructions, 2007. Available from: http://www.minervation.com/wp-content/uploads/2011/04/Minervation-LIDA-instrument-v1-2.pdf. |
[34] | Martins EN, Morse LS (2005) Evaluation of internet websites about retinopathy of prematurity patient education. Br J Ophthalmol 89: 565-568. https://doi.org/10.1136/bjo.2004.055111 |
[35] | WebFXReadability test: Quick and easy way to test the readability of your work. Available from: https://www.webfx.com/tools/read-able/. |
[36] | Brown DM (2021) Simple Measure of Gobbledygook (SMOG) formula for calculating readability. Network of the National Library of Medicine/NNLM Region 4 . Available from: https://news.nnlm.gov/region_4/simple-measure-of-gobbledygook-smog-formula-for-calculating-readability/. |
[37] | Institute of Mental Health.Media release: Latest nationwide study shows 1 in 7 people in Singapore has experienced a mental disorder in their lifetime. Institute of Mental Health (2018) . Available from: https://www.imh.com.sg/Newsroom/News-Releases/Documents/SMHS%202016_Media%20Release_FINAL_web%20upload.pdf. |
[38] | American Psychiatric AssociationPatients and families. Available from: https://www.psychiatry.org/patients-families. |
[39] | National Institute of Mental HealthHealth topics. Available from: https://www.nimh.nih.gov/health/topics. |
[40] | Medline Plus.Mental health and behaviour. National Library of Medicine . Available from: https://medlineplus.gov/mentalhealthandbehavior.html. |
[41] | World Health OrganizationFact sheets. Available from: https://www.who.int/news-room/fact-sheets. |
[42] | Centers for Disease Control and Prevention.About mental health. U.S. Department of Health & Human Services . Available from: https://www.cdc.gov/mentalhealth/learn/index.htm. |
[43] | Mayo Clinic. Mental illness . Available from: https://www.mayoclinic.org/diseases-conditions/mental-illness/symptoms-causes/syc-20374968. |
[44] | Cleveland ClinicMental health (2022). Available from: https://health.clevelandclinic.org/topics/health-a-z/mental-health/. |
[45] | National Alliance on Mental IllnessFrequently asked questions. Available from: https://www.nami.org/FAQ. |
[46] | Anxiety and Depression Association of AmericaUnderstand anxiety and depression: Take the first step—Understand the facts. Available from: https://adaa.org/. |
[47] | International OCD FoundationWhat is OCD?. Available from: https://iocdf.org/about-ocd/. |
[48] | NP DigitalDiscover what people are asking about.... AnswerThePublic. Available from: https://answerthepublic.com/. |
[49] | (2016) Substance Abuse and Mental Health Services AdministrationDSM-5 Child Mental Disorder Classification. DSM-5 Changes: Implications for Child Serious Emotional Disturbance [Internet]. Rockville, MD: . Available from: https://www.ncbi.nlm.nih.gov/books/NBK519712/. |
[50] | Ministry of HealthMOH Clinical Practice Guidelines—Depression, 2012. Available from: https://www.moh.gov.sg/docs/librariesprovider4/guidelines/depression-cpg_r14_final.pdf. |
[51] | Koo TK, Li MY (2016) A guideline of selecting and reporting intraclass correlation coefficients for reliability research. J Chiropr Med 15: 155-163. https://doi.org/10.1016/j.jcm.2016.02.012 |
[52] | Seifert D (2017) Samsung's new virtual assistant will make using your phone easier. The Verge . Available from: https://www.theverge.com/2017/3/20/14973742/samsung-bixby-virtual-assistant-announced-galaxy-s8. |
[53] | Kocaballi AB, Quiroz JC, Rezazadegan D, et al. (2020) Responses of conversational agents to health and lifestyle prompts: Investigation of appropriateness and presentation structures. J Med Internet Res 22: e15823. https://doi.org/10.2196/15823 |
[54] | Farooqui A (2020) Samsung's old S Voice assistant is being discontinued on June 1, 2020. Sammobile . Available from: https://www.sammobile.com/news/samsungs-s-voice-assistant-being-discontinued-june-1-2020/. |
[55] | Cheng C, Dunn M (2015) Health literacy and the Internet: A study on the readability of Australian online health information. Aust N Z J Public Health 39: 309-314. https://doi.org/10.1111/1753-6405.12341 |
[56] | Vaidyam AN, Linggonegoro D, Torous J (2021) Changes to the psychiatric chatbot landscape: A systematic review of conversational agents in serious mental illness: Changements du paysage psychiatrique des chatbots: une revue systematique des agents conversationnels dans la maladie mentale serieuse. Can J Psychiatry 66: 339-348. https://doi.org/10.1177/0706743720966429 |
[57] | Martinez-Miranda J, Martinez A, Ramos R, et al. (2019) Assessment of users' acceptability of a mobile-based embodied conversational agent for the prevention and detection of suicidal behaviour. J Med Syst 43: 246. https://doi.org/10.1007/s10916-019-1387-1 |
[58] | Sander L, Kuhn C, Bengel J, et al. (2019) Responses of German-speaking voice assistants to questions about health issues. Bundesgesundheitsblatt Gesundheitsforschung Gesundheitsschutz 62: 970-980. https://doi.org/10.1007/s00103-019-02979-x |
medsci-09-04-028-s001.pdf |