Research & summarize

Enhance Transcription Accuracy with Prompting and Post-Processing

Correct transcription errors for company and product names using Whisper prompts and GPT-4 post-processing for enhanced accuracy and reliability.

Without it

Piece it together by hand, every time.

With it

Improve the accuracy of audio transcriptions, especially for proper nouns like company and product names, by leveraging both Whisper's prompt capabilities and GPT-4's post-processing.

What you get

  • Guide Whisper transcriptions using a list of correct spellings in the prompt.
  • Utilize GPT-4 for post-processing to correct misspellings identified in transcriptions.
  • Compare the effectiveness of prompt-based guidance versus GPT-4 post-processing for accuracy.

Use this prompt chain

OpenAI Cookbook TranscribeSummarizeClassify

Addressing transcription misspellings: prompt vs post-processing

We are addressing the problem of enhancing the precision of transcriptions, particularly when it comes to company names and product references. Our solution involves a dual strategy that utilizes both the Whisper prompt parameter and GPT-4's post-processing capabilities.

Two approaches to correct inaccuracies are:

  • We input a list of correct spellings directly into Whisper's prompt parameter to guide the initial transcription.

  • We utilized GPT-4 to fix misspellings post transcription, again using the same list of correct spellings in the prompt.

These strategies aimed at ensuring precise transcription of unfamilar proper nouns.

Setup

To get started, let's:

  • Import the OpenAI Python library (if you don't have it, you'll need to install it with pip install openai)
  • Download the audio file example

Setting our baseline with a fictitious audio recording

Our reference point is a monologue, which was generated by ChatGPT from prompts given by the author. The author then voiced this content. So, the author both guided the ChatGPT's output with prompts and brought it to life by speaking it.

Our fictitious company, ZyntriQix, offers a range of tech products. These include Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, and DigiFractal Matrix. We also spearhead several initiatives such as PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., and F.L.I.N.T.

Whisper transcribed our company name, product names, and miscapitalized our acronyms incorrectly. Let's pass the correct names as a list in the prompt.

When passing the list of product names, some of the product names are transcribed correctly while others are still misspelled.

You can use GPT-4 to fix spelling mistakes

Leveraging GPT-4 proves especially useful when the speech content is unknown beforehand and we have a list of product names readily available.

The post-processing technique using GPT-4 is notably more scalable than depending solely on Whisper's prompt parameter, which has a token limit of 244. GPT-4 allows us to process larger lists of correct spellings, making it a more robust method for handling extensive product lists.

However, this post-processing technique isn't without limitations. It's constrained by the context window of the chosen model, which may pose challenges when dealing with vast numbers of unique terms. For instance, companies with thousands of SKUs may find that the context window of GPT-4 is insufficient to handle their requirements, and they might need to explore alternative solutions.

Interestingly, the GPT-4 post-processing technique seems more reliable than using Whisper alone. This method, which leverages a product list, enhances the reliability of our results. However, this increased reliability comes at a price, as using this approach can increase costs and can result in higher latency.

Now, let's input the original product list into GPT-4 and evaluate its performance. By doing so, we aim to assess the AI model's ability to correctly spell the proprietary product names, even with no prior knowledge of the exact terms to appear in the transcription. In our experiment, GPT-4 was successful in correctly spelling our product names, confirming its potential as a reliable tool for ensuring transcription accuracy.

In this case, we supplied a comprehensive product list that included all the previously used spellings, along with additional new names. This scenario simulates a real-life situation where we have a substantial SKU list and uncertain about the exact terms to appear in the transcription. Feeding this extensive list of product names into the system resulted in a correctly transcribed output.

We are employing GPT-4 as a spell checker, using the same list of correct spellings that was previously used in the prompt.

Comments (0)

Sign In Sign in to leave a comment.