Communication

Potential applications and implications of large language models in primary care

Abstract

The recent release of highly advanced generative artificial intelligence (AI) chatbots, including ChatGPT and Bard, which are powered by large language models (LLMs), has attracted growing mainstream interest over its diverse applications in clinical practice, including in health and healthcare. The potential applications of LLM-based programmes in the medical field range from assisting medical practitioners in improving their clinical decision-making and streamlining administrative paperwork to empowering patients to take charge of their own health. However, despite the broad range of benefits, the use of such AI tools also comes with several limitations and ethical concerns that warrant further consideration, encompassing issues related to privacy, data bias, and the accuracy and reliability of information generated by AI. The focus of prior research has primarily centred on the broad applications of LLMs in medicine. To the author’s knowledge, this is, the first article that consolidates current and pertinent literature on LLMs to examine its potential in primary care. The objectives of this paper are not only to summarise the potential benefits, risks and challenges of using LLMs in primary care, but also to offer insights into considerations that primary care clinicians should take into account when deciding to adopt and integrate such technologies into their clinical practice.

Introduction

In recent years, the explosive growth of artificial intelligence (AI)-powered applications has brought about a transformative wave across various industries, and healthcare is no exception. AI, a broad conceptual term, refers to the use of automated computer systems that can reason and perform cognitive functions similar to that of humans.1 AI and its related technologies have been around since the 1950s, with machine learning, a subset of AI, having multiple applications in healthcare within the last decade.2 3 Recent advancements in AI, specifically the development of large language models (LLMs) with natural language processing (NLP) techniques, as seen in conversational chatbots like ChatGPT and Bard, have attracted global attention due to their ability to provide appropriate, coherent, human-like responses to questions across diverse knowledge domains.4

Recent studies on the performance of LLM-powered chatbots in answering medical questions have demonstrated its impressive medical capabilities. For example, ChatGPT-3.5 demonstrated its proficiency in medical knowledge by achieving a passing score of 60.2% on multiple-choice questions derived from the United States Medical Licensing Exams (USMLE), while Google’s Med-PaLM 2 (an LLM specifically trained on medical datasets) achieved an even higher score of 86.5% in the same test.5 Emerging research highlights the rapidly evolving capabilities of LLMs, with the latest ChatGPT version (GPT-4) achieving an accuracy of 100% in another USMLE sample exam.6 All of these findings suggest two notable points. First, these generative AI chatbots already possess sufficient medical knowledge for conceivable applications in medicine, potentially causing a paradigm shift in healthcare delivery.4 Second, considerable uncertainty persists regarding the inner mechanics through which these models process their training data to produce the intended output, raising concerns that LLMs could be manipulated maliciously to generate harmful or misleading content.7

While numerous studies have examined the broad applications of LLMs in medicine and healthcare, there is, to the best of the author’s knowledge, a lack of research assessing the potential benefits, limitations and factors to consider when it comes to integrating LLMs into primary care. Given that primary care usually serves as the first level of contact a patient has with the healthcare system, there is significant potential to leverage the transformative power of LLMs to enhance the delivery of primary healthcare services for the benefit of both patients and practitioners. It is hoped that the findings of this article will offer valuable guidance to primary care clinicians interested in efficiently and effectively deploying LLM technology into their practices.

This article begins by exploring the history and development of conversational agents as well as examining the mechanics of LLMs. Subsequently, it provides a summary of both the practical applications and limitations of LLMs. Finally, it delves into the implications of these technologies for primary care clinicians.

History: from simple conversational agents to LLMs

In essence, chatbots are user-friendly interactive computer programmes designed to engage in human-like conversations, facilitating users’ access to information and services.8 Even before the widespread availability of advanced generative AI chatbots like ChatGPT or Bard, basic conversational chatbots were already used across various healthcare domains to assist with the provision of healthcare services. These applications included conducting patient satisfaction surveys, screening and triaging medical conditions, and assisting with medical education.9–11 As part of the public health response to the COVID-19 pandemic, there was a swift increase in the deployment of conversational chatbots to provide real-time, comprehensible and reliable information regarding the COVID-19. These COVID-19 related chatbots were designed to serve various purposes, such as providing vaccine education, debunking misinformation through fact-checking, conducting disease surveillance, monitoring at-risk populations and facilitating contact tracing.12 However, the constraints of these early chatbot iterations were quite evident, with varied evidence regarding their effectiveness and user satisfaction. These limitations included the inability to provide personalised responses or recommendations, repetitive responses, and the capability to automate only a restricted range of basic tasks and queries.13

Fast-forward to the present, and chatbots are arguably at the forefront of the AI revolution. The integration of NLP and machine learning (ML) algorithms into chatbots has significantly enhanced their capabilities, allowing them to engage in human-like conversations on much more complex topics. ChatGPT and Bard are examples of LLM-based chatbots that use deep learning techniques and extensive datasets to understand and generate conversational responses from natural language inputs.4 14 More specifically, both applications are language models powered by a transformer-based neural network.15 The transformer architecture comprises multiple layers, known as ‘transformer blocks’, which process and extract features from input data in a hierarchical manner.16 This makes transformer-based language models highly effective for NLP tasks, as they can discern patterns in how words and phrases relate to each other, ultimately generating text that is not only appropriate but also contextually relevant across a wide range of prompts.17 Moreover, as LLMs expand in size through self-consistency and reinforcement learning from human feedback, their capabilities advance and allow these models to learn and extract more knowledge from their training data, enabling them to perform tasks that they were not explicitly trained for.18 19 This represents a significant advancement not observed in previous iterations of chatbots and highlights the remarkable adaptability of LLMs to meet the evolving challenges in various areas of medicine.

Clinical applications of LLMs in primary care

This section provides an overview into the key applications of LLMs in primary care, highlighting its potential to enhance practice management, complement patient–physician interactions and improve clinicians’ diagnostic capabilities.

Practice management

The current healthcare system is burdened by a substantial amount of paperwork and administrative tasks, diverting valuable time away from healthcare professionals. Physicians are estimated to spend almost half (49.2%) of their time managing electronic hospital records and handling administrative tasks, while only 27% of their time is devoted to direct clinical interactions with patients.20 It is not surprising that the substantial administrative workload plays a significant role in clinician burnout or influencing doctors to either avoid a career in family medicine or depart from it.21 Therefore, reducing this burden not only serves as a direct and clear method to alleviate stress and enhance the well-being of current primary care clinicians but also has the potential to attract more doctors into this specialty.

LLMs could be used to help primary care clinicians save time on non-clinical duties by automating routine and repetitive medical tasks, ranging from medical data entry to efficiently searching and aggregating medical information. This automation can result in the generation of comprehensive summaries that encompass a wide variety of information, including the patient’s medical history, potential diagnosis and available treatment options.22 Studies have shown ChatGPT’s ability to produce clear and well-structured patient discharge summaries of acceptable quality when given an initial prompt.23 24 By automating these administrative tasks through the use of LLMs, there is the potential to minimise delays in patient discharge from primary care to secondary care, all while maintaining a high level of detail. The integration of LLMs into existing primary healthcare electronic health records (EHRs) systems presents a promising opportunity to improve the efficiency and accuracy of medical records. This integration stands as a valuable asset, lending support to quality improvement initiatives and championing evidence-based practices. In doing so, it simplifies existing operational processes, reduces administrative costs and enhances overall cost efficiency for healthcare providers. Although it is conceivable that employing LLMs for managing patient records enhances operational efficiency, its impact on the overall quality of care received by patients remains uncertain. A systematic literature review revealed that the use of EHRs during medical consultations had a detrimental effect on doctor–patient communication.25 Therefore, without adequate protocols and training, integrating LLMs into EHR systems may continue to undermine the delivery of patient-centred care.

Patient–physician interactions

Besides reducing the burden of administrative tasks, LLMs also have the potential to facilitate more meaningful interactions in patient care. Research evaluating the quality and empathic nature of responses generated by ChatGPT revealed that when presented with a variety of medical queries, its responses were predominantly accurate and exhibited a higher level of empathy compared with responses from physicians.26 27 Moreover, Google’s Med-PaLM 2, a specialised LLM specifically trained on medical data, has shown immense promise in generating more accurate and helpful answers to medical questions than traditional unspecialised LLMs like ChatGPT.5 Patients typically have numerous questions about their medical condition(s), and there is often insufficient time for them to ask questions and engage in discussions with their family clinician during the medical appointment. Consequently, many patients turn to ‘Dr. Google’ to seek answers, often sifting through vast amounts of unfiltered information. Traditional search engines like Google can provide a deluge of data, which can be overwhelming and sometimes even misleading. In contrast, LLMs, particularly Google’s Med-PaLM 2, can offer a more targeted and personalised approach. By simplifying complex medical terminology and presenting health information in a manner that is more accessible, accurate and relevant to the patient’s specific medical concerns, patients’ may have a better understanding of their illnesses. This improvement in comprehension promotes better engagement and adherence to treatment plans, enabling clinicians to provide care that aligns with the patient’s priorities and concerns.

Clinical diagnostic support

LLM technology could also be applied to augment existing clinical decision-making processes. Its impact on clinical care is vast and diverse ranging from facilitating an efficient triaging system to improving patient management.28 The world’s population is ageing, bringing about an even greater number of people living with chronic diseases.29 LLMs could serve as an innovative solution to address the increasing demand for primary care services while maintain efficiency and improving the effectiveness of medical care.

First, LLMs can process and analyse vast amounts of medical knowledge and patient-specific data, such as information regarding the patient’s presenting complaints (or symptoms), their medical and family histories, lifestyle practices, among other relevant information, to generate a working diagnosis.28 The current accuracy of ChatGPT in forming a differential diagnosis and recommending appropriate care management decisions demonstrates its potential to assist in streamlining the primary care triage process.30 31 Accordingly, this initial assessment provided by LLMs could play an important role in swiftly identifying patients displaying ‘red flag’ signs and symptoms, enabling clinicians to promptly redirect them to appropriate secondary care.

Second, in the medical management of patients, clinicians can rely LLMs to suggest specific diagnostic tests based on the patient’s symptoms, aiding in the thorough investigation of potential medical conditions. Moreover, LLMs, particularly medically trained models, could be used by clinicians to offer tailored treatment plans to an individual needs and lifestyles practices. These models would predict patient care trajectories, and identify possible treatment complications based on similar cases. Ultimately, LLMs serve as a crucial tool for primary care clinicians, facilitating an accurate and expedited clinical decision-making process.

Barriers that impede the application of LLMs

While LLMs hold great promise and have the potential to revolutionise clinical practice, several barriers impede their immediate application. In this section, the article explores key limitations of LLM, including its shortcomings in privacy and data security, and risk of reproducing factually incorrect or biased conclusions.

Privacy and data security

Security concerns remain a significant obstacle in the immediate application of LLMs into clinical care. Currently, information inputted into publicly available LLMs lacks anonymity, thus leaving patient confidentiality and privacy unprotected. In response to these concerns, several hospitals in Perth, Australia, have implemented a ban on the use of ChatGPT.32 To ensure the accuracy and completeness of AI-generated responses to requests for clinical documentation and patient queries about their medical condition, it is essential to train such language models on extensive datasets that encompass a broad spectrum of health information. However, determining the required scope and diversity of data necessary for AI algorithms to operate effectively is often challenging.33 Inadvertently allowing language models to be trained on excessive personal health data could be opportunistically used by other parties, such as health insurers, to adjust insurance premiums based on an individual’s health status.33 This scenario poses a significant risk, as it may lead to unforeseen outcomes, particularly unintentional identification of patients through the linking of de-identified patient data to identifiable information.

Inaccurate and biased outputs

Perhaps the most significant drawback of LLMs lies in their potential to generate inaccurate and biased information. The accuracy of responses generated by these models depends on the quality of the data that they were trained on. If the training data is biased, potentially due to inappropriate sampling methods or data collection processes, then the AI algorithms may inadvertently perpetuate this bias, leading to inaccurate and misleading outputs.34 35 Because the content generated by LLM-based chatbots may lack extensive validation, users are required to rely on their own judgement to assess the accuracy of the content.35 Consequently, there is a real risk that these chatbots may instil patients with a false sense of confidence in their understanding of crucial clinical information, leading them to believe that they possess more knowledge than their doctor. Such a scenario can result in misunderstandings and hinder the effective delivery of medical care by causing patients to disregard medical advice. Therefore, it is crucial that the medical datasets used to train LLMs are consistently up-to-date and accurate. Achieving this goal requires collaboration among technology companies, governments, healthcare providers and expert clinicians across a broad spectrum of medical disciplines. The objective is to establish universal guidelines for validating both the quality and type of health data used in training these language models. It also entails creating a structured framework for ensuring accountability and auditability of AI-generated information. This rigorous process is essential to enhancing the reliability and trustworthiness of AI-generated responses.

Implications for primary care clinicians

Regardless of our attitudes towards AI, LLM-powered applications have gained widespread popularity and are gradually reshaping the practices of the primary care sector. Instead of fearing the idea that AI could replace us, we should welcome the opportunities and advantages that such technologies bring. This includes opportunities to ease administrative burdens and improve the quality of patient care and outcomes. However, we must also be mindful of the numerous limitations and shortcomings in the capabilities of LLMs that may affect their usability in clinical practice. Box 1 provides a summarised overview of important factors that primary care clinicians should consider regarding the integration of LLMs into clincial practice.

Box 1

Summary of take-home messages regarding the implications of LLMs in clinical practice

Highlights

  • Accuracy of outputs: LLMs such as ChatGPT are known to produce logical but factually incorrect outputs. Clinicians must be mindful that current non-domain specific LLMs are not designed to provide medical advice, and any medical interpretations should be fact-checked against prevailing clinical evidence/guidelines.

  • Regulatory oversight: The unique capabilities and limitations of LLMs have illuminated calls for such technologies to be regulated. Clinicians need to periodically assess the impact of a changing regulatory environment on the use of AI and its associated technologies in clinical practice.

  • Enhancing professional competence in AI: In order for AI technologies, including LLMs, to be successfully integrated into clinical practice, clinicians at all career stages need to have the skills, attitudes and knowledge to use such tools safely and effectively. The creation of an AI-ready workforce requires incorporation of AI competencies into existing medical and professional development curricula.

  • Respecting patient preferences: Every patient has distinct views and preferences regarding the incorporation of AI in the clinical decision-making process. Clinicians need to proactively communicate with patients regarding the role of AI in their medical care including responding to patients’ fears and ensuring that their choices are respected.

AI, artificial intelligence; LLMs, large language models.

Accuracy and reliability of AI-generated responses

It has been acknowledged that certain LLMs, such as ChatGPT, can produce logically coherent information that may be false or inaccurate. This occurrence, known as ‘AI hallucination’, refers to the phenomenon where an AI-powered algorithm generates fictional or unsubstantiated information in response to a query.36 Moreover, it is important to note that the datasets used to train LLMs may be outdated and incomplete. In the case of ChatGPT, its training data is based solely on information available up until September 2021.37 Hence, clinicians should exercise caution when using LLMs and refrain from overly relying on advice provided by these applications. Instead, they should use their professional judgement to selectively choose clinically relevant information and discard any that are not.

Given that current non-domain specific LLMs like ChatGPT are not designed to serve as reliable sources of medical information, it would be more prudent for clinicians to use specialised medical domain-specific interactive interfaces, like Evidencehunt—an AI-powered, evidence-based search engine that consolidates clinical evidence on specific topics—to assist them in making well-informed clinical decisions. However, it is important to note that this tool does not differentiate between contextually relevant and irrelevant clinical evidence. It functions to summarise available articles indexed on PubMed, and therefore, its results are meaningful only when interpreted in specific contexts, yet these contexts may not necessarily align with the unique circumstances of the presented patient.

AI regulation

Clinicians need to evaluate how AI governance frameworks impact the practical applications of such technology in their clinical settings. As public interest in LLMs continues to grow, attempts to regulate this technology are also on the rise. As AI applications become more prominent in healthcare, clinicians must recognise the importance of handling sensitive healthcare data in accordance with strict ethical and privacy standards. That is, clinicians must ensure that they do not misuse sensitive health data in any manner that compromises patient confidentiality or violates privacy regulations. To proactively uphold these principles, clinicians should allocate time to stay informed about the latest updates in data protection and privacy laws that govern their practice.

Development of AI competencies

To successfully adopt AI-powered technologies in primary care, clinicians at all career stages need the confidence and skills to use these emerging tools and keep pace with the rapid developments in this field. As these new technologies find their way into the hands of our patients, there is an urgent need to integrate education about AI technologies into the existing medical and primary care training curricula. These educational competencies should help primary care clinicians understand the fundamental principles and opportunities for AI use in clinical applications and cover the risks and challenges of AI use. This is particularly crucial in addressing concerns related to confidentiality, consent, and the limited clinical knowledge of LLMs in order to ensure the safety of the patient and clinician. Clinicians must also be aware of how they can effectively communicate with patients regarding their use of LLMs-based tools, including supporting and training patients on how to critically assess the accuracy and relevance of AI-generated content. Moreover, since LLMs generate content based on a user input, they are sensitive to how the input text (or prompt) is framed.38 Therefore, variations in the words or phrases used in the prompt can affect the quality and accuracy of LLM generated information.38 To ensure the robustness and effective use of LLMs, there is also a need to conduct further research in prompt engineering. In particular, research would need to focus on the reproducibility and reliability of LLM generated interpretations across different prompt variations for the same medical query. In doing so, the medical field can establish universal guidelines that provide clear guidance to clinicians and patients on how to construct prompts in a way that allows LLMs to perform a diverse array of medical tasks safely and effectively. For now, it seems that the most appropriate approach for users to take in optimising their prompts would be to experiment with different prompt styles and compare the outputs with the desired results. This process assists users in identifying a suitable structure for future prompts related to a particular query.

Patients’ preferences

Finally, clinicians must respect their patients’ attitudes and preferences towards incorporating AI into healthcare decision-making. While AI chatbots promise to revolutionise clinical practice, patients’ trust in AI technologies remains low. In a study examining the role of AI chatbots in behavioural health, it was found that despite the demonstrated effectiveness of chatbots in promoting healthier lifestyles and offering a safe platform for discussing sensitive topics like sex-related issues, drug and alcohol use, less than 50% of participants expressed acceptance of their potential future use.39 Additionally, an American survey found that 60% of American adults would be uncomfortable if their clinician relied on AI for diagnosis or treatment recommendations.40 Indeed, these findings highlight the challenges of navigating diverse patient preferences when it comes to using AI in healthcare. Integrating AI tools in clinical settings raises questions about balancing technological advancements with the human touch in healthcare, with concerns that AI could potentially depersonalise patient interactions. Transparent communication, ethical considerations, and respect for patient autonomy are crucial elements for fostering the widespread acceptance and effective integration of AI into existing healthcare systems. Certainly, there is a genuine need for more standardised research in this area to thoroughly understand both the advantages and risks, fostering a balanced and informed approach to incorporating AI into clinical practice.

Conclusion

In conclusion, the potential impact of LLMs and subsequent AI-based technologies in primary care is both vast and transformational. These innovations stand to revolutionise healthcare delivery, introducing solutions that can help clinicians make better informed clinical decisions, reduce their administrative burden and improve patient outcomes. However, to realise these benefits, researchers and practitioners must be mindful of the challenges and risks associated with their usage. It is imperative to proceed with caution and implement proactive measures to address potential threats, aiming to avoid unintended consequences that may compromise the safety and quality of patient care. Regardless of our stance, the progression of AI chatbots in primary care is inevitable. The most prudent approach is to actively embrace and educate ourselves about the capabilities of AI and leverage them for the improvement of healthcare delivery.