OpenAI’s speech recognition system, Whisper, hallucinates at full throttle

The medical field is transforming. Determined to integrate the latest technologies, the sector is looking closely at what technology players have to offer to modernize the daily lives of healthcare professionals. Among others: artificial intelligence. The release of the voice recognition tool Whisper – signed OpenAI – was a decisive turning point for a certain number of hospitals and medical centers and for companies that decided to build their platform from this tool.

This is the case of Nabla A Parisian startup that decided to change direction and rely on artificial intelligence to push an attractive product to a booming market. So, in March 2023, the company announced the launch of Nabla Copilot, an assistant “Designed to Ease the Administrative Burden on Providers and Reduce Clinician Burnout.”

The latter leverages AI-based note generation capabilities, medical coding recognition and seamless EHR (electronic health record) platform integrations, including through Whisper.

The solution printed by Nabla subject to risks

Although the numbers seem to show the company has bet on the right horse – more than 30,000 clinicians and 40 health systems, including the Mankato Clinic in Minnesota and Children’s Hospital Los Angeles – the tool presents a size problem: it is prone to hallucinations. Still, it’s fine-tuned to medical language to transcribe and summarize patient interactions, said Martin Raison, Nabla’s chief technology officer. AP News.

Company officials said they are aware that Whisper can hallucinate and are addressing the problem. However, it is impossible to verify that the transcription generated by Nabla’s AI is correct in relation to the original recording because the Nabla tool deletes the original audio for “data security reasons”. So it seems impossible to know to what extent the tool is hallucinating.

1% of audio transcripts contain hallucinations

However, the problem exists. Five researchers from Cornell University, the University of Washington and other universities, in a study found that about 1% of audio transcripts contain hallucinatory sentence fragments or entire sentences that do not exist in any form in the underlying audio.

More interestingly, Whisper’s thematic analysis of hallucinatory content shows that 38% of hallucinations include explicit harm such as the perpetuation of violence, the creation of inaccurate associations or the implication of false authority.

An improvement observed after an update of Whisper at the end of 2023

“In April and May 2023, transcripts generated from 187 audio segments yielded 312 transcripts containing hallucinations. On average, 1.4% of transcripts in our data set contained hallucinations. Of these hallucinations, 19% included harm sustaining violence, 13% included damage from inaccurate associations, and 8% included damage from false authorities”. detail the researchers in their study.

In December 2023, new Whisper tests will be conducted on the same audio segments. They show significant improvement, with only 12 of the 187 sound segments continuing to produce hallucinations. “This improvement is likely due to Whisper updates in November 2023”conclude the researchers.

And unfortunately, they are not the only ones demonstrating this. A machine learning engineer said he discovered hallucinations in about half of the more than 100 hours of Whisper transcripts he analyzed. Another developer said he found hallucinations in nearly every one of the 26,000 transcripts he created with the tool.

No interrogation at OpenAI

Ultimately, however, this error rate can lead to an increasing number of erroneous transcriptions and distort the time and efficiency savings that users seek, whether in the medical sector or elsewhere. And if some want to turn to OpenAI to blame someone, the company thought of protecting itself.

So it recommends against using the Whisper API “High-stakes decision-making contexts, where errors in accuracy can lead to pronounced errors in the results.” Likewise, a list of high-risk areas has been made to clear ourselves of any misuse of the tool.

Leave a Comment