Health Systems Action

Can Generative AI Close The Knowledge Gap?

Healthcare suffers from a “Know-Do Gap”(1). As knowledge expands, the capacity to implement it lags, and the gap therefore widens. To make things worse, the gap between what’s already known, and the arrival of new knowledge to be learned, is growing too. Let’s call this the Knowledge Gap.

 Closing the Knowledge Gap requires access to this new knowledge, a primary source being research publications (journals). This literature requires filtering for quality, relevance and sense-making. Only then is evidence in hand that can be applied to closing the Know-Do gap.

This is an enormous task. There were over 980,000 PubMed citations in 2019(2), 32,000 clinical trials started in 2022 alone(3) and the amount of medical research published annually doubles every 5 years(4). It’s a struggle to keep up even for specialists. Adding to the difficulty, many published studies generate poor quality evidence(5).

In low-middle income countries (LMICs), the Know-Do Gap is large, due to many factors at a health system level. The Knowledge Gap is also wide, one reason being that access to medical libraries is limited. The US National Library of Medicine’s PubMed (https://pubmed.ncbi.nlm.nih.gov/), with 35 million citations, including abstracts, is a fantastic and free global resource. Google Scholar is another excellent, free resource covering a wide range of scholarly literature and disciplines including healthcare.

While abstracts, via PubMed or Google Scholar, are freely accessible, and Open Access journals have increased in number, most journals demand a substantial fee for full text content. During the Covid-19 pandemic6 the situation partly eased. Full text pandemic-related papers were made available without charge along with large numbers of pre-publication items. Post-pandemic, the status quo – high charges and publisher profit margins – has returned even though journals do not fund research or pay peer reviewers and benefit from government-funded research produced by researchers employed by public universities. Journal subscriptions priced in the thousands of dollars, and $30-35 or more for an individual article, are a major constraint for libraries, never mind individuals. Pooling and networking arrangements can help. Copying and sharing as “fair use” enables limited sharing.

In response to the information needs of healthcare professionals and researchers, sites like Sci-Hub emerged. Created by Alexandra Elbakyan in 2011, it hosts over 80 million academic papers and is popular among researchers and students. Alternatives, mainly for non-journal content, include Z-Library and Library Genesis. These sites are blocked in many countries due to copyright infringement and their use is illegal in some places. While useful they are hardly a solid and sustainable option for access by healthcare professionals.

Another challenge is that reliably searching the biomedical literature requires familiarity with search methods, keywords and search terms. In addition, appraisal skills are limited. Critically assessing the methods and results of clinical trials is hard to master. Trial design and statistics are challenging topics for those without PhDs or extensive hand-on research experience.

Falling behind isn’t acceptable, however. Practitioners adopt various tactics and use various resources to keep up: clinical evidence syntheses like UpToDate (www.uptodate.com) and Dynamed (https://www.dynamed.com) are helpful but also come at significant cost. Narrative and systematic reviews help but these expand greatly in number too, along with clinical guidelines. Refresher courses and updates from experts are important but often very little knowledge retention occurs. The informal advice of colleagues in “hallway consults” remains essential in providing knowledge and advice for decision making.

The advent and impact of ChatGPT, other Large Language Model (LLM) based “generative” AI, and the maturing trajectory of other AI techniques suggests that artificial intelligence may have arrived just in time to bridge otherwise impossible Knowledge Gaps.

For accessing and assimilating the biomedical literature a host of new companies have developed task-specific, AI-driven chatbots. These services may offer benefits over the already hugely popular general purpose ones like Bing (from Microsoft), Bard (Google) and ChatGPT from OpenAI, which have ingested medical literature, along with everything else on the internet. These tools are good enough to pass medical licensing exams7, but do not connect directly to recent or current publications.  

Here are five examples of the innovation happening in this area.

OpenEvidence (openevidence.com) says its mission is to “organize the world’s medical knowledge into understandable, clinically useful formats”. Its Medical Advisory Board is drawn from famous US academic institutions, including Mayo Clinic, and includes a Nobel laureate, the behavioural scientist Daniel Kahneman. It claims to be evidence-based (“data comes directly from scientific primary sources – high-quality, peer-reviewed studies published in leading medical journals”), independent (“unbiased analysis based on public verifiable information”), open (“free and universally accessible”), complete (“all the most rigorous evidence in one place, not cherry-picked studies to support an argument or medical opinion”), up-to-date (“living analysis”) and accurate (“PhD-level fact-checking and world-class medical review of every article”).

Scite (scite.ai) analyses over 25 million full-text scientific articles and has a database of more than 880 million classified citation statements. It invites users to “ask simple questions and get reliable answers from the full-texts of millions of research articles”, and “discover supporting and contrasting evidence”. It offers support for writers, “whether it’s a simple blog post, essay, or a grant proposal” and to “effectively use information from research articles to support your research tasks”. It also features a service to upload a document (manuscript, grant, preprint, or published paper) to check the reliability of its references.

Scite is built on citation indices which are tools used by the academic community for research and research evaluation that aggregate scientific literature output and measure impact by collating citation counts. Citation indices help measure the interconnections between scientific papers but do not communicate contextual information about a citation. A citation that presents contrasting evidence to a paper is treated the same as a citation that presents supporting evidence. The developers of Scite therefore used machine learning, document ingestion methods, and a network of researchers to develop a “smart citation index” which categorises citations based on context. Scite shows how a citation was used by displaying the surrounding textual content from the citing paper and a classification from a deep learning model that indicates whether the statement provides supporting or contrasting evidence for a referenced work, or simply mentions it.

The mission of Consensus (consensus.app) is to “use AI to make science accessible and consumable for all… and be the go-to source in anyone’s search for expert information”. The developers not that “searching for vetted, unbiased information has long been an arduous and painful process”, “if used thoughtfully, carefully, and elegantly, artificial intelligence can finally change this equation.” Consensus offers scientific results, (searches through peer-reviewed, published sources), instant analysis (“our AI reads the papers for you and extracts key finding”) and is free from advertising (“we show you results from scientists, not marketing teams”).

Elicit (elicit.org) is a research assistant that automates parts of research workflow. Currently, the main workflow in Elicit is Literature Review. If asked a question, Elicit will show relevant papers and summaries of key information about those papers in an easy-to-use table. Elicit’s users are primarily researchers who use it to find papers to cite and define their research directions, answer questions, compile literature reviews for publication and “to get perfect scores on exams”.

Evidence Hunt (evidencehunt.com) allows users to “search for clinical evidence in a quick and effective way”, for example clinical trials in oncology, the newest evidence in certain disease areas, or clinical evidence on a specific drug.  It offers “fast answers to any clinical question”, finding the latest clinical evidence by using simple search terms, predefined medical specialties, or your own PubMed query. The system will extract the population and intervention from your question, match abstracts that provide evidence for it, and return the answer together with the abstracts that were used to answer your question. There is an option to subscribe to weekly e-alerts for your search.

Will innovators, “big tech” companies or incumbent commercial medical publishers, provide the solution to Knowledge Gaps? Will these solutions be affordable or widen disparities between wealthy countries and LMICs? Open source large Language Models like LLaMa, from Meta, now in its second version Llama 2 ( and BLOOM (BLOOM) may help to level the field, but running these models has large compute costs; who will pay?

Cost is not the only barrier to use. There are risks. Though remarkably “well read”, having ingested more information than could a human in multiple lifetimes, LLM-based generative AI services like ChatGPT have a notorious tendency to put out “hallucinations” – misinformation, untruths, and wrong or bad advice. Will the new specialist services be different? There is a need for regulation and certification to assess accuracy, safety and quality.

How good do these AI tools need to be to qualify for routine use in the clinical world? In education? Answers to clinical questions for use in diagnosis or treatment must meet a higher standard.

Most clinicians in LMICs – nurses probably make up the bulk – would not pass a specialty licensing exam the way ChatGPT has done. Where human expertise (HI) is scarce and the need great, advice or input from AI may be invaluable, and occasional errors might be acceptable because of net benefit, but this needs to be confirmed in clinical trials.

REFERENCES

 1.       Bridging the Know-Do Gap | Harvard Public Health Magazine | Harvard T.H. Chan School of Public Health. Accessed July 9, 2023. https://www.hsph.harvard.edu/magazine/magazine_article/bridging-the-know-do-gap/

2.       MEDLINE® Citation Counts by Year of Publication (as of January 2022)*.

3.       National Library of Medicine. ClinicalTrials.gov. Accessed July 9, 2023. https://clinicaltrials.gov/search

4.       PubMed total records by publication year | National Library of Medicine | National Institutes of Health | Open Data Portal. Accessed July 9, 2023. https://datadiscovery.nlm.nih.gov/Literature/PubMed-total-records-by-publication-year/eds5-ig9r

5.       Ioannidis JPA. Why Most Published Research Findings Are False. PLOS Med. 2005;2(8):e124. doi:10.1371/JOURNAL.PMED.0020124

6.       Clark J. How covid-19 bolstered an already perverse publishing system. BMJ. 2023;380:p689. doi:10.1136/BMJ.P689

7.       Kung TH, Cheatham M, ChatGPT, et al. Performance of ChatGPT on USMLE: Potential for AI-Assisted Medical Education Using Large Language Models. medRxiv. Published online December 20, 2022:2022.12.19.22283643. doi:10.1101/2022.12.19.22283643

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top