Clinical AI was back in the news this week with the release by a consortium of five international colleges of radiology including RANZCR of a comprehensive joint statement on AI use in the discipline. In what is somewhat of a departure from the norm for this sort of thing, the statement turned out to be pretty useful.
Acknowledging that AI has been used for decades in the discipline but that it also has the potential to revolutionise patient care, the authors outline some of the positive and negative consequences, but at the same time give some practical advice to AI developers as well as organisations looking to purchase AI tools and integrate them into clinical care. The statement is worth a good read.
Also this week was a good read in JAMA Network by University of Maryland School of Medicine researchers led by Katherine Goodman on clinical applications of generative AI and large language models (LLMs). We have quickly become used to the new language that has accompanied this boom since ChatGPT shook everything up – including the weirdly wonderful term AI hallucinations – but sycophantic bias was a new one.
The paper emphasises that things are moving so fast in clinical AI that FDA guidance on clinical decision support software published two months before ChatGPT’s release was almost immediately out of date and probably needs a rethink. These debates are happening here in Australia and New Zealand, with Australia’s Therapeutic Goods Administration grappling with AI in clinical practice not long after an exhaustive process on regulating software as a medical device (SaMD).
Back at the University of Maryland, and Goodman et al have written a thought-provoking opinion piece on the next steps for AI in clinical documentation and the potential to produce sycophantic summaries.
As the researchers write, most AI-powered summarisation tools are currently aimed at simpler clinical documentation and are used to create summaries from audio-recorded patient encounters. The real power of AI in the near future though will be in the ability to draw information from across the EMR to create a clinical snapshot from different data sources.
One of the dangers as they sees it is the problem of “sycophancy bias”, which they compare to the behaviour of an eager personal assistant who tailors responses to the boss’s expectations. “In the clinical context, sycophantic summaries could accentuate or otherwise emphasize facts that comport with clinicians’ preexisting suspicions, risking a confirmation bias that could increase diagnostic error,” they say.
“Even summaries that appear generally accurate could include small errors with important clinical influence. These errors are less like full-blown hallucinations than mental glitches, but they could induce faulty decision-making …”
They have some recommendations, including developing comprehensive standards for LLM-generated summaries “that include stress-testing for sycophancy and small but clinically important errors”. LLM summaries also need to be clinically tested before widespread deployment they say, although how this will done is not clear. We’d be interested in your thoughts in our weekly poll below.
This week also saw the release of a new paper from the Aged Care Industry IT Council (ACIITC) on the digital maturity of the aged care sector in Australia. Far from the revolution that is AI, the adoption of digital tools in aged care is still very much in its infancy.
Reading the report, you get the feeling that the sector is not only suffering from change fatigue from the legislative reforms that followed the Royal Commission, but from survey fatigue as well. If clinicians and administrative staff in healthcare are burning out from paperwork and the documentation burden, you should see what aged care has to put up with.
However, it is another welcome piece of research from the ACIITC and the potential to develop benchmarking tools and maturity toolkits is also most welcome.
That brings us to our poll question for the week:
Is there a need for comprehensive standards for LLM-generated clinical summaries?
Vote here and leave your comments below.
Last week we asked: Is the federal government’s Health ISAC trial to share intelligence a good idea?
80 per cent said yes. We also asked if there was perhaps a better approach. Here’s what you said.
It seems hard for users to believe but LLMs are merely probabilistic representations of the most common language usage it identifies in the corpora used to train it. They have no intelligence – they are just COPY CATS.
Regulating the release of LLMs and where they can be used needed to take into account the key aspects of how they are built just in the way we regulate how multi-story skyscrapers are built. Here are Key elements that must be understood and regulated at the level of detail that has consequences for public usage and civil care of the community that is to protect mental and physical health.:
1. The characterisation of the modelling task.
2. The pre-processing algorithms applied to the training reports for ingestion into the algorithm.
3. The variables investigated in selecting the training and test corpora.
4. The source of the training corpus used to create the model.
5. The variables selected to assess the accuracy of the model.
6. The test corpora selected to represent the variables.
7. The characteristics of the variables used in the assessment of accuracy.
8. The methods for improving the model for particular clients.
9. The methods available for updating the model for changes in standards.
To ensure AI benefits humanity ther e is a fundamental need to continully develop a management framework that evolves with our collective learning. Then use this framework to teach parallel auditing AI that monitors the output of generative AI in real time. Any findings (positive and negative) can be debated and used to further develop the monitoring framework and AI itself.
Essentially, we need performance AI and parallel conformance AI with human oversight.
In talking about the role of computer support for human (critical) intelligence, terms such as “augmented” and “assistance” are better than the term “artificial intelligence”.
Genuine “intelligence” is “critical” not “sycophantic”.
The field is nascent – broad guidelines and restrictions to implementation for routine use until validated. Comprehensive standards at this point are likely to create unnecessary barriers to innovation.
Not doctors using them, should show it’s AI generated as rider, easy to note issues for next user
The experts in the field drawn from Industry, clinical practice and researchers in AI
We have ongoing disagreement around clinical data and how it can be shared succinctly and effectively in such a way that ensures and elevates clinical communication and handover crossing teams, clinical specialties and professional domains. The work that is already underway to specify minimum data sets for clinical information sharing can be referenced as part of comprehensive standards development- in Australia this needs to be sited with the ADHA as the digital health data stewards using research evidence and professional peak body input to ensure issues such as bias, usability and outcomes are at the forefront