Can ChatGPT Transcribe Audio? Here’s What You Need to Know

This has been an inventive year, and everyone is talking about AI. The market for AI worldwide is over US $ 136 billion, from autonomous cars to ChatGPT. Experts say that the industry will expand more than 13-fold in seven years.

Given the variety of forms content can take, coupled with a race to automation on all fronts, transcription services have become one of AI’s biggest beneficiaries. No longer do you have to hire a stenographer or typist for recording transcription? Automatically convert any video or audio to text in minutes with AI.

But can ChatGPT transcribe audio? The latest version runs on the multimodal model (GPT-4) and is by far the most advanced ChatGPT we’ve ever seen. You might be thinking, Now that ChatGPT’s list of language-based capabilities is steadily expanding with each upgrade.

Can ChatGPT Transcribe Audio? 

Yes, ChatGPT does indeed offer a Speech to Text option powered by OpenAI’s Whisper API.

Users thus upload an audio file to ChatGPT, which is processed by the speech recognition algorithm, whose output will be converted into text. Currently, the Whisper API supports the following file types: mp3,mp4, mpeg, mpga, m4a, wav, webm. However, file uploads are now limited to 25 MB.

With OpenAI’s Whisper API, when you send input prompts to it your own app can have built-in speech-to-text. This is different from the ChatGPT API. Bundled with it is OpenAI Whisper including dialects and supporting facilities for Arabic, Greek, Polish, Swahili, Hindi, Malay, Tagalog, Hebrew, Marathi, Urdu, Kannada, and Welsh.

ChatGPT was trained on massive speech data. ChatGPT Speech to Text can understand and transcribe over 50 languages up to industry-standard benchmarks. It also translates and transcribes audio files from many languages into English.

You can use the speech-to-text feature through ChatGPT on your PC or laptop. You can also use these features on the ChatGPT app for IOS. This means that the convenience and potential of speech-to-text transcribing is at hand. Thinking about transcription, and how we can do it better OpenAI is still a pioneer.

The ChatGPT Speech to text Feature

The ChatGPT voice-to-text feature uses Whisper API
The ChatGPT voice-to-text feature uses Whisper API

Whisper is Open AI’s automated speech recognition system trained on more than 680,000 hours of multilingual and multitask data. The ChatGPT voice-to-text feature uses this API. There is no supervision during the training.

So how does it work?

When you upload audio to the API, the track is broken into 30-s portions. These parts are converted by the system into images similar to a graph depicting audio’s various changes. The photos enter the encoder, which understands every audio nuance in them. Finally, they pass through the decoder which attempts to guess words from sound pictures.

Language Support

The Whisper audio-to-text architecture contains two endpoints that aid in transcribing the original language and translating it into English. Both endpoints have support for many languages, including English, Arabic, French, Japanese Chinese, German, and Spanish. The word error rate, a standard industry benchmark, is less than 50 % in these languages.

In addition, the language model has been trained in 98 different languages.

File support

The API works with mp3, wav, mpeg, mp4, m4a, mpga and webm. But there’s a 25 MB cap on the size of an uploaded audio file. If it’s larger, find another way to compress it online or split it up into smaller parts.

Capability on PC, Laptop, and iOS

On a PC, laptop, or iOS device, you can already use ChatGPT’s speech-to-text feature.

To make sure the code runs smoothly, use OpenAI Python v0.27.0 on your PC and laptop itself. You also have to provide the audio in a specified format. If you use an iOS device, you may need to download the official ChatGPT app for your iPhone.


Using prompts in the Whisper API can greatly raise the quality of the transcript you receive, just as is the case with any other model from Open AI. The Whisper audio-to-text model adjusts its formatting accuracy to match your question. If you follow proper capitalization and punctuation in the prompt, then so will the output.

Using the prompt to correct commonly misheard words and acronyms in the audio. However, there are restrictions on how to use the prompts compared with other models. For example, Whisper API offers less control over the style and tone but more over basic formatting.

Also, the more complex the audio, the poorer the results of transcription. Despite its limitations, however, Whisper API is still one of the best when it comes to transcribing content speedily and accurately.

Applications of ChatGPT Speech to Text

There are countless other applications for ChatGPT
There are countless other applications for ChatGPT

There are many ways in which you can use an AI transcription service such as Whisper API. But these are the most popular ones.

Content creation – It provides a way for content creators to reuse their own material.

Medicine – Doctors can use it to transcribe their patient’s notes

Finance  It can transcribe financial reports and important calls.

Education  –  May help with transcribing lectures and discussions.

Marketing personnel – It can be used to transcribe meetings.

For example, beyond just transcription, there are countless other applications for ChatGPT such as content creation, market research, and customer service. Its great versatility is due to its powerful NLP capabilities.

How accurate is ChatGPT Speech to Text?

ChatGPT’s natural language processing skills are outstanding. But no speech-to-text transcription tool will ever reach the one hundred percent accuracy mark. Therefore, we expect a fairly high degree of accuracy. In any event, some natural limitations to this Whisper API do exist. For example, the quality of the audio file (low frequency or background interference), diction and pronunciation affect how well it is received.

You can take this a step further, using ChatGPT to break up your transcription and create summaries, key points, and even related topics.

Final Thoughts

OpenAI’s Whisper, and other AI transcription tools like it, may be quick, but they don’t have the accuracy or nuance of human transcribers. However capturing subtleties of speech, privacy concerns over data, and too much reliance on technology are major obstacles. Moreover, hidden costs and access problems make AI transcription less ideal than it sounds.

If you have wondered if you can use ChatGPT to transcribe audio files, the answer is yes. More powerful, and further enhanced In addition, as the model develops still more we can look to greater accuracy and functionality of natural language processing as well. We’ll also see how it can benefit a range of industries beyond healthcare alone, including education and finance.

Seamus Wilbor

Seamus Wilbor

Seamus Wilbor, CEO and Founder at Quarule. He has over 20 years of expertise as an AI Consultant in evaluating AI technology and developing AI strategies.